Anomaly detection in high dimensional data is a critical research issue with serious implication in the real-world problems. Many issues in this field still unsolved, so several modern anomaly detection methods struggle to maintain adequate accuracy due to the highly descriptive nature of big data. Such a phenomenon is referred to as the “curse of dimensionality” that affects traditional techniques in terms of both accuracy and performance. Thus, this research proposed a hybrid model based on Deep Autoencoder Neural Network (DANN) with five layers to reduce the difference between the input and output. The proposed model was applied to a real-world gas turbine (GT) dataset that contains 87620 columns and 56 rows. During the experiment, two issues have been investigated and solved to enhance the results. The first is the dataset class imbalance, which solved using SMOTE technique. The second issue is the poor performance, which can be solved using one of the optimization algorithms. Several optimization algorithms have been investigated and tested, including stochastic gradient descent (SGD), RMSprop, Adam and Adamax. However, Adamax optimization algorithm showed the best results when employed to train the DANN model. The experimental results show that our proposed model can detect the anomalies by efficiently reducing the high dimensionality of dataset with accuracy of 99.40%, F1-score of 0.9649, Area Under the Curve (AUC) rate of 0.9649, and a minimal loss function during the hybrid model training.

Nowadays, a huge amount of data is produced periodically at an unparalleled speed from diverse and composite origins such as social media, sensors, telecommunication, financial transactions, etc. [

Anomaly detection points to the challenge of detecting trends in data that do not correspond to anticipated behavior [

Several anomaly detection techniques have been proposed across different application domains [

Instigated by the preceding problems, this study proposes a novel hybrid deep learning-based approach for anomaly detection in large-scale datasets. Specifically, a data sampling method and multi-layer deep autoencoder with Adamax optimization algorithm is proposed. Synthetic Minority Over-sampling Technique (SMOTE) is used as a data sampling method to resolve the inherent class imbalance problem by augmenting the number of minority class instances to the level of the majority class label. A novel deep autoencoder neural network (DANN) with Adamax optimization algorithm is used for detecting anomaly and reducing dimensionality. The primary contributions of this work are summarized as thus:

A novel DANN approach to detect anomalies in time series by the unsupervised mode.

Hybridization of SMOTE data sampling and DANN to overcome inherent class imbalance problem.

We addressed and overcame the curse of dimensionality in data by applying a multilayer autoencoder model that can find optimal parameter values and minimize the difference between the input and the output using deep reconstruction error during the model training.

The rest of this paper is structured as follows. Section 2 highlighted the background and the related works of the study. Section 3 outlines this work's research methodology, while Section 4 describes the experimental findings. Lastly, Section 5 concludes the paper and highlights the future work.

Anomaly detection is a well-known issue in a variety of fields, so different approaches have been proposed to mitigate this issue recently. Further information about this issue can be found in [

One of the commonly used anomaly detection technique is called neighbor-based anomaly detection technique whereby the outliers are identified based on the neighborhood information. Thus, the anomaly is scored as the average or weighted distance between the data object and its k nearest neighbors [

Another applicable approach is deploying the subspace learning method. Sub-space-based anomaly detection approaches try to locate anomalies by sifting through various subsets of dimensions in an orderly manner. According to Zimek et al. [

Ensemble learning is another feasible anomaly detection approach. This can be attributed to it efficiency over baseline methods [

This section describes the gas turbine (GT) dataset, the real-world data utilized for anomalies detection in the high-dimensional dataset. It discussed the various techniques used for dimensionality reduction and features optimization and the different stages of the proposed hybrid model.

The dataset used in this research is real industry high dimensional data for a gas turbine. This data contains 87620 columns and 56 rows. In this study, the data has been splits into a training set and testing set with a ratio of 60:40. Detecting anomalies in real-world high-dimensional data is a theoretical and a practical challenge due to “curse of dimensionality” issue, which is widely discussed in the literature [

An autoencoder is a particular type of artificial neural network utilized primarily for handling tasks of unsupervised machine learning [

The proposed DANN has the following mathematical model form:

_{m} _{d}

which is also called the reconstruction error.

Apart from the reconstruction error specified in

The authors in [

Based on the above, we have designed a similar procedure for dimensionality reduction utilizing DANN model. First, the matrix of the original data is partitioned into two split sets, to contain only the usual operating data. One set for the training purpose and another for DANN model testing. Second, the autoencoder neural network is trained to make use of the training dataset. Once it is trained, the autoencoder neural network start computing the principal components and residuals by feeding a new data sample. This is followed by determining the T^{2} and Q statistics as follows:

^{k} denotes the principal component value of k^{th} in the latest sample of data, and σ^{k} denotes the k^{th} principal component standard deviation as determined from the training dataset. It is a worth mention that the upper control limits were set with assuming the compliance of the data with a multivariate normal distribution. Thus, a different approach was followed in this work by calculating the upper control limits for two statistics directly from the given large dataset without assuming any possible distribution form. For instance, with a hundred samples of normal training data, the next biggest T^{2} (or Q) value is chosen as the upper control limit to attain a false alarm rate of 0.01.

Resampling the data, including undersampling and oversampling, is one of the prominent approaches to relieve ths issue of imbalanced dataset [

Adam [

The algorithm make use of adaptive learning rate techniques to determine the learning rates for each parameter individually. Adam algorithm is extremely efficient when dealing with complex problems involving a large number of variables or records. It is reliable and needs less memory. It is a combination of the ‘gradient descent with momentum and the ‘RMSP’ methods. The momentum method accelerates the gradient descent algorithm by taking the ‘exponentially weighted average’ of the gradients into account. In addition, it utilises the advantages of Adagrad [

Hence,

_{t} denotes gradients aggregate at time t (present), m_{t -- 1} is the aggregate of gradients at time t−1 (prior), W_{t} is the weights at time t, W_{t+1} is the weights at time t+1, α_{t} is the learning rate at time t, ∂L is the derivative of loss function, ∂W_{t} is the weights derivative at time t, β is the moving average parameter.

RMSprop is an adaptive learning method that attempts to boost AdaGrad. Rather than computing the cumulative number of squared gradients as AdaGrad does, it computes an ‘exponential moving average’.

Therefore,

_{t} is the weights at time _{t+1} is the weights at time t+1, α_{t} is the learning rate at time t, ∂L is the loss function derivative ∂W_{t} is the derivative of weights at time t, V_{t} is the sum of the square of past gradients, β is the moving average parameter

_{1}_{2}

This section summarizes the experimental findings and discusses their significance for the different approaches including DANN with Adam optimizer, DANN with SGD optimizer, DANN with RMSprop optimizer, and DANN with Adamax optimizer.

No. | Deep autoencoder | Deep autoencoder with Adam optimizer | Deep autoencoder with SGD optimizer | Deep autoencoder with RMSprop optimizer | Deep autoencoder with Adamax optimizers |
---|---|---|---|---|---|

1 | 0.9874 | 0.9619 | 0.8779 | 0.5191 | 0.9845 |

2 | 0.9112 | 0.9681 | 0.8616 | 0.9371 | 0.9948 |

3 | 0.9149 | 0.9760 | 0.6811 | 0.9756 | 0.9963 |

4 | 0.9649 | 0.9804 | 0.9094 | 0.9896 | 0.9778 |

5 | 0.9660 | 0.9871 | 0.8280 | 0.9922 | 0.9985 |

6 | 0.9486 | 0.9825 | 0.8779 | 0.9885 | 1.0000 |

7 | 0.9804 | 0.9858 | 0.9049 | 0.9937 | 0.9993 |

8 | 0.9567 | 0.9869 | 0.8835 | 0.9945 | 0.9904 |

9 | 0.9848 | 0.9889 | 0.9131 | 0.9959 | 0.9993 |

10 | 0.9767 | 0.9919 | 0.9619 | 0.9974 | 1.0000 |

0.9591 | 0.9736 | 0.8699 | 0.9383 | 0.9940 | |

As depicted in

The vt element in the Adam update rule scales the gradient in reverse correspondingly to the ℓ2 norm of the previous gradients (by the vt−1 term) and current gradient gt2 as presented in

Five measurement metrics are utilized to evaluate the performance of our experiment: Accuracy, Precision, Recall rate, F1-Score, and receiver operating characteristics (ROC). Accuracy is defined as the proportion of correctly classified samples and has the following formula:

Precision is characterized as the proportion of those who truly belong to Category-A in all samples classified as such. In general, the higher the Precision, the lower the system's False Alarm Rate (FAR).

The recall rate indicates the proportion of all samples categorized as Category-A that are ultimately classified as such. The recall rate is a measure of a system's capability to detect anomalies. The greater it is, the more anomalous traffic is correctly observed.

The F1-score enables the combination of precision and recall into a single metric that encompasses both properties.

TP, FP, TN, FN represent True Positive, False Positive, True Negative and False Negative, respectively.

Accuracy is the most widely used metric for models trained using balanced datasets. This indicates the fraction of correctly estimated samples to the overall number of samples under evaluation for the model.

F1-score is frequently employed in circumstances where an optimum integration of precision and recall is necessary. It is the harmonic mean of precision and recall scores of a model. Thus, the F1 score can be defined as given in

A receiver operating characteristics (ROC) is a method for organizing, visualizing, and selecting classification models based on their performance [

The proposed five models AUC values in this analysis are presented in the Legend portion of

When optimizing classification models, cross-entropy is often utilized as a loss function. Cross-entropy as a loss function is extremely useful in binary classification problems that include the prediction of a class mark from one or more input variables. Our model attempts to estimate the target probability distribution Q as closely as possible. Thus, we can estimate the cross-entropy for an anomaly prediction in high dimensional data using the cross-entropy calculation given as follows:

Predicted P(class0) =

Predicted P(class1) =

This implies that the model explicitly predicts the probability of class 1, while the probability of class 0 is given as one minus the expected probability.

To detect the anomaly in high dimensionality industrial gas turbine dataset, we were unable to find any research contribution that has been evaluated, but we have compared the results with the two recently proposed approaches for anomaly detection in the high dimensional dataset [

As presented in

Prediction model | Precision | Recall | F1-score | AUC | Accuracy% |
---|---|---|---|---|---|

Proposed DANN | 0.9112 | 0.9760 | 0.9376 | 0.943 | 95.91 |

Proposed DANN with Adam optimizer | 0.9874 | 0.9619 | 0.9811 | 0.951 | 97.36 |

Proposed DANN with SGD optimizer | 0.8711 | 0.9170 | 0.8823 | 0.893 | 86.99 |

Proposed DANN with RMSprop optimizer | 0.9244 | 0.9404 | 0.8280 | 0.916 | 93.83 |

Proposed DANN with Adamax optimizer | 0.9660 | 0.9718 | 0.9649 | 0.981 | 99.40 |

Song et al. [ |
0.7172 | 0.7171 | 0.7171 | Not Reported | Not Reported |

Fawcett [ |
Not Reported | Not Reported | Not Reported | 0.95 | Not Reported |

This study proposed an efficient and improved deep autoencoder based anomaly detection approach in real industrial gas turbine data set. The proposed approach aims at improving the accuracy of anomaly detection by reducing the dimensionality in the large gas turbine data. The proposed deep autoencoder neural networks (DANN) were integrated and tested with several well-known optimization methods for the deep autoencoder training process. The proposed DANN approach was able to overcome the curse of dimensionality effectively. It evaluated based on commonly used evaluation measures to evaluate and validate the DANN models performance. The DANN-based Adamax optimization method has achieved the best performance with an accuracy of 99.40%, F1-score of 0.9649 and an AUC rate of 0.9649. At the same time, the DANN-based SGD optimization method obtained the worse performance in anomaly detection in the high dimensional dataset.