As a common and high-risk type of disease, heart disease seriously threatens people’s health. At the same time, in the era of the Internet of Thing (IoT), smart medical device has strong practical significance for medical workers and patients because of its ability to assist in the diagnosis of diseases. Therefore, the research of real-time diagnosis and classification algorithms for arrhythmia can help to improve the diagnostic efficiency of diseases. In this paper, we design an automatic arrhythmia classification algorithm model based on Convolutional Neural Network (CNN) and Encoder-Decoder model. The model uses Long Short-Term Memory (LSTM) to consider the influence of time series features on classification results. Simultaneously, it is trained and tested by the MIT-BIH arrhythmia database. Besides, Generative Adversarial Networks (GAN) is adopted as a method of data equalization for solving data imbalance problem. The simulation results show that for the inter-patient arrhythmia classification, the hybrid model combining CNN and Encoder-Decoder model has the best classification accuracy, of which the accuracy can reach 94.05%. Especially, it has a better advantage for the classification effect of supraventricular ectopic beats (class S) and fusion beats (class F).

In recent years, with improvement of the standard of living, people are paying more and more attention to their own health. Early detection and treatment of the disease puts higher demands on medical workers and corresponding equipment. Among various diseases, heart disease is not only more likely to happen, but also poses a greater threat to human life. As a common examination method, electroencephalography (ECG) can reflect the state of the heart at every moment, which is an important reference for doctors to diagnose. However, the identification of ECG still requires experienced medical staff to accurately diagnose the pathology. Therefore, the use of intelligent devices and related algorithms to achieve real-time monitoring of the patient’s heartbeat state has a strong practical significance, which is a research hotspot for scholars.

The realization of the traditional automatic arrhythmia classification algorithm can be divided into four parts: data acquisition, preprocessing, feature extraction and classification. In the data acquisition part, we can collect the ECG signals through relevant medical diagnostic equipment. However, due to the need to protect the privacy, the open arrhythmia data sets are often used in most research. The data preprocessing part mainly completes the analysis and filtering of the related noise in the ECG signal, which can improve the efficiency of subsequent classification. And the common types of the noise include baseline drift, power frequency interference and emg interference [

In the traditional classifier implementation, the output characteristic value of the former three steps is often taken as the input of the classifier. And the classification model is constructed by relevant algorithm to complete the automatic classification of arrhythmia. Salam et al. [

According to the characteristics of data, deep learning can accomplish two functions of feature extraction and classification, which can avoid complex feature extraction engineering to a certain extent and reduce the impact of manually extracting eigenvalues on classification effects. Ali Isin et al. [

However, the above methods do not take advantage of the time series characteristics of the ECG. Owing to Long Short-Term Memory (LSTM) is an improvement on Recurrent Neural Network (RNN), it makes good use of the characteristics of timing and has a good application in solving sequence problems [

In this paper, we mainly build models based on how to solve these three problems. For the imbalance of data, Generative Adversarial Networks (GAN) can realize the learning of data distribution in the game between generator and discriminator, and it already achieves better results in the field of image data enhancement [

The rest of this paper is organized as follows. The database and the data preprocessing are presented in Section 2. The theoretical description of the arrhythmia classification model constructed in this paper are presented in Section 3. And then the experimental results are presented in Sections 4. Finally, the conclusion is discussed in Section 5.

Since the patient information has strong privacy, we use the MIT-BIH arrhythmia database as the data set for training and testing in this paper. The database was collected from 48 different patients and 48 heartbeat records were recorded, each approximately lasting for 30 minutes [

As is shown in

In this section, we mainly complete two aspects of work, including heart beat segmentation and dataset partitioning.

In the heart beat segmentation part, we mainly realize the interception of the individual heartbeat. In the MIT-BIH database, each piece of data is in units of records and contains multiple heart beats. In this paper, we mainly focus our research on the classification of arrhythmia. Therefore, the model is constructed directly based on the heartbeat, which is beneficial to the training of the model by increasing the total data amount.

In the MIT-BIH arrhythmia database, the annotation file contains the manually labeled R-peak position, which is convenient for the researchers to segment the heartbeat. We take this file as a reference, and complete the heart beats segmentation by taking the corresponding data from the left and right. The specific implementation is described as follows. The definition ’R-R interval’ is the sample between two adjacent R peak positions. So the samples in this interval can be divided into two parts to obtain a sample of the individual heartbeat. We take the R peak position of each heart beat as the center, and 45% of the samples are collected from the left interphase, while 55% of the samples are collected on the right side to complete the segmentation of the heartbeat, which is shown in

In the data set division, we divide the data into training set and testing set based on the inter-patient heartbeats, which can improve the scalability of the classification model. We divide all records into two categories according to the existing proposed data division method [

In addition, in order to improve the generalization of the algorithm, we use a linear function to normalize each heartbeat, which is beneficial to the training of the model. Finally, a series of 1*64 heartbeat data is obtained, which is directly used for model construction.

In this paper, a hybrid model combined with CNN and encoder-decoder model is designed based on the characteristics of ECG signals, and GAN is used for data enhancement. Using the good feature extraction ability of the CNN and the time series features extraction ability of the LSTM, the classification model is constructed, which is shown in

The preprocessing part has already been described in Section 2, and the principles of the later parts of the model are described in this section. Since most of the networks used in this paper are constructed based on CNN and LSTM, in this section, we first briefly introduce the basic principles of CNN and LSTM.

CNN is mainly composed of convolution layer, pooling layer and fully connected layer. The feature value is extracted by convolution layer and pooling layer, and finally the classification output result is obtained through the fully connected layer.

The convolutional layer is the core of the CNN, and the feature extraction is mainly performed by the convolution kernel. By convolving the input data with the kernel function, the corresponding feature map is obtained. Different convolution filters correspond to different feature values, and finally the output of the layer is obtained by the activation function. The specific expression result of the convolutional layer is represented by the weight

The pooling layer, also known as the downsampling layer, is mainly used to reduce data parameters. The pooling operation achieves dimensionality reduction of data and combines low-level local features into higherlevel features. Similarly, taking the

The function of the fully connected layer is to integrate the abstract features of the former layer and then send the output values to the classifier for classification. After flattening into one-dimensional, the feature data is directly sent into the fully connected layer, and the mapping between the feature value and the output category is completed. For fully connected layer, a commonly used model is the Multi-Layer Perception (MLP). In addition, in the improvement of the model, the SVM is usually used to replace the MLP to improve the classification effect of the model.

Since the LSTM is an improved network based on RNN, it mainly adds three logic gates, including the forgot gate, the input gate and the output gate. Each hidden state

The forgot gate primarily determines how much of the previous moment output is retained as the input to the current state. The specific expression formula can be expressed as

The input gate primarily determines how much of the current time input value is retained as the actual input to the current state, which can be expressed as follows

The output gate mainly determines how much the output state at this moment is reserved as the output of the next-time state, which can be expressed as

Similar to the CNN, the LSTM training method is to propagate the error through the loss function, and calculate the partial derivative of weights to obtain the final classification model.

In data equalization part, we mainly solve the problem of large difference in the amount of data between different types of heart beats. According to the data set division method and the AAMI arrhythmia classification method, we can know that the amount of data between the categories are quite different, which make it difficult to train the model. In order to improve the accuracy of model training and achieve a balanced number of heartbeats as much as possible, we use the GAN to realize data equalization.

The GAN model is mainly composed of the generator

In the training process, we use the gradient descent method to optimize the

Since there is no clear definition of the specific implementation of

Since there is no clear definition of the specific implementation of

For the discriminator D, a three-layer one-dimensional convolution operation is used to obtain the convolution feature, and the corresponding output is obtained through the logistic regression layer. For the generator G, the model uses the full connection, upsampling and convolution operations for the input noise vector to learn the original data distribution. The composition of the model is shown in the

In this paper, we set the generator updates once when the discriminator updates 5 times. The DCGAN model is constructed and trained for the class S, class V, and class F heartbeat data, respectively. And using the trained model, we can achieve the generation data, of which the distribution is as similar as the original data to realize the relative balance of the classification data. The data distribution of DS1 after using DCGAN isshown in

Class | Original data | Enhanced data |
---|---|---|

N | 45866 | 45866 |

S | 944 | 44991 |

V | 3788 | 44992 |

F | 415 | 44989 |

As a commonly used deep learning algorithm, CNN mainly realizes feature extraction and classification of target data. Therefore, the difference in waveform between different heartbeat categories provides data guarantee for feature extraction.

Through the previous design and simulation, we design a simple CNN model in this paper. The model only contains four layers of convolution and pooling operations, simplifying the complexity of the model without affecting the effect of classification. The CNN structure used in this paper is shown in

Layer function | Kernel | Stride | Number of kernels |
---|---|---|---|

Convolution1 | 1 * 3 | 1 | 32 |

Maxpooling1 | 1 * 2 | 2 | 32 |

Convolution2 | 1 * 3 | 1 | 64 |

Maxpooling2 | 1 * 2 | 2 | 64 |

Convolution3 | 1 * 3 | 1 | 128 |

Maxpooling3 | 1 * 2 | 2 | 128 |

Convolution4 | 1 * 3 | 1 | 256 |

Maxpooling4 | 1 * 2 | 2 | 256 |

The encoder-decoder model is a commonly framework structure which is widely used in the solution of the sequence to sequence problem. The ECG signal is essentially a time-based sequence signal, which is similar to the speech signal. So the encoder-decoder model can be used as a classification model of the ECG signal due to its good practicability to the time series. In this paper, for the feature information extracted by CNN, a classifier based on encoder-decoder model is designed, and its time series features are further considered. The specific principle and implementation of the model are described below.

The encoder-decoder model consists of three parts, including the encoder, the semantic vector and the decoder. The encoder mainly completes encoding of the input information, and generalizes the information into a memory mode as a semantic vector. The decoder takes the semantic vector as the initial input state and completes the semantic transformation through the corresponding decoding algorithm. The specific implementation of the encoder and decoder are flexible. The optional models include CNN, RNN, Bi-directional Recurrent Neural Network (BiRNN) and LSTM, etc. There is no uniform specification for the encoder and decoder model algorithms, so a variety of encoder-decoder models can be constructed by different combinations of algorithms.

In this paper, we combine the CNN with the encoder-decoder model based on LSTM to construct the classification model. The original input signal is passed through the CNN model, and the feature extraction is performed to generate the corresponding feature variable as the input of the encoder-decoder model. And the classification of the arrhythmia is realized by the LSTM model. The structure of this model is shown in the

Encoder: The LSTM model structure is used. The input is the training result of the fully connected layer in the CNN model, which means the feature vector. And the output is the semantic vector representation of the corresponding target value, which is used to initialize the decoder input.

C: Represents a semantic vector. The encoder encodes any length of sequence information into a fixed length of context information vector as input to the decoder.

Decoder: The LSTM model structure is used. The input is the semantic vector output representation of the encoder, and the output is the vector corresponding to the target. Then, it is converted to the probability value by the sofmax function, and the different types of the arrhythmia is generated one by one.

In this paper, the ECG signals are classified abnormally by a hybrid model based on encoder-decoder model. At the same time, in order to compare the classification effect of the model, three different algorithms are used to classify ECG arrhythmia. The three algorithms include one-dimensional CNN model (1-D CNN), a combination of CNN and SVM (CNN+SVM), and the encoder-decoder model combined with CNN model (CNN+ED). We use the MIT-BIH arrhythmia database as the input data set. Besides, in order to measure the effect of data equalization, 1-D CNN is used as the classifier to verify its effect based on the results, in a condition of whether using data equalization or not. After applying the preprocessing technique mentioned above, the data are sent into the models for training and testing.

In the evaluation index, we measure the simulation results of four types of arrhythmia in four standard metrics based on the confusion matrix, which includes classification accuracy (ACC), sensitivity (TPR), specificity (TNR) and positive predictivity (PPV). At the same time, we use F1-score to evaluate classification effects of different types. The respective definitions of these five metrics adopting true positive (TP), true negative (TN), false positive (FP) and false negative (FN) are expressed as follows

Firstly,

Method | ACC | TPR | TNR | PPV |
---|---|---|---|---|

93.53 | 50.68 | 89.76 | 42.79 | |

The results in the

Through the training and test process of the model, the simulation results of the three models are shown in

By comparing the average results of various metrics corresponding to the three models, it can be known that the basic CNN network model has better average accuracy and specificity, both of which are about 90%. While 1-D CNN model has worse average sensitivity and predictivity. The CNN+SVM model has improved the integral classification accuracy, but for individual indicators, the improvement effect is not obvious. The CNN+ED model has a greater degree of improvement in all the metrics, which can reach more than 70%. The average accuracy and specificity are improved by about 3% and 6%, respectively, while the average sensitivity and predictivity increase the most, which can reach up to 20%.

Method | ACC | TPR | TNR | PPV |
---|---|---|---|---|

94.45 | 53.23 | 90.41 | 46.56 | |

96.20 | 48.77 | 88.72 | 50.35 | |

Method | ACC | TPR | TNR | PPV |
---|---|---|---|---|

97.53 | 59.06 | 92.21 | 66.22 | |

97.03 |

CLASS | ACC | TPR | TNR | PPV |
---|---|---|---|---|

N | 94.22 | 94.62 | 91.00 | 98.85 |

S | 94.83 | 85.73 | 95.18 | 40.6 |

V | 99.85 | 97.79 | 99.99 | 99.87 |

F | 99.20 | 36.41 | 99.68 | 46.94 |

Ave | 97.03 | 78.64 | 96.46 | 71.56 |

Accuracy | 94.05 |

Besides,

Method | ACC | |
---|---|---|

SVM | 92.07 | |

PSO+SVM | 89.10 | |

1D-CNN | 93.47 | |

CNN-ED | 93.80 | |

DCGAN-CNN-ED | 94.05 |

In this paper, a hybrid model combined with CNN and encoder-decoder model is designed for the classification of arrhythmia, and GAN method is used as data equalization method. The inter-patient heartbeat data processing results are used to verify the classification effect. The simulation results show that the classification model constructed in this paper has a good classification effect, especially in the class S and class F. And the accuracy of this model is as high as 94.05%. The CNN models used in this paper are all four layers. And the accuracy can be improved by combining them with other learning models, which can avoid complex convolution operations to some extent. In a word, in this paper we had basically completed the classification of arrhythmia under the premise of automatically extracting the characteristic parameters, which is conducive to the auxiliary treatment of heart disease.