Structural Health Monitoring (SHM) systems have become a crucial tool for the operational management of long tunnels. For immersed tunnels exposed to both traffic loads and the effects of the marine environment, efficiently identifying abnormal conditions from the extensive unannotated SHM data presents a significant challenge. This study proposed a model-based approach for anomaly detection and conducted validation and comparative analysis of two distinct temporal predictive models using SHM data from a real immersed tunnel. Firstly, a dynamic predictive model-based anomaly detection method is proposed, which utilizes a rolling time window for modeling to achieve dynamic prediction. Leveraging the assumption of temporal data similarity, an interval prediction value deviation was employed to determine the abnormality of the data. Subsequently, dynamic predictive models were constructed based on the Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) models. The hyperparameters of these models were optimized and selected using monitoring data from the immersed tunnel, yielding viable static and dynamic predictive models. Finally, the models were applied within the same segment of SHM data, to validate the effectiveness of the anomaly detection approach based on dynamic predictive modeling. A detailed comparative analysis discusses the discrepancies in temporal anomaly detection between the ARIMA- and LSTM-based models. The results demonstrated that the dynamic predictive model-based anomaly detection approach was effective for dealing with unannotated SHM data. In a comparison between ARIMA and LSTM, it was found that ARIMA demonstrated higher modeling efficiency, rendering it suitable for short-term predictions. In contrast, the LSTM model exhibited greater capacity to capture long-term performance trends and enhanced early warning capabilities, thereby resulting in superior overall performance.

To prevent the structural performance deterioration-induced catastrophic failures of immersed tunnel, the establishment of a Structural Health Monitoring (SHM) system has emerged as an effective solution [

The rapidly advancing digital transformation in infrastructure has thrust data-driven anomaly detection methods into the research spotlight. Autoregressive Support Vector Machines, as proposed in [

However, for unannotated SHM data, the previously proposed supervised learning approaches have been inadequate due to the algorithms’ inability to learn anomaly patterns with limited examples. Currently, there are only very few studies that combine the mechanism of anomaly patterns with comprehensive data analysis [

The crux of the model-based anomaly detection approach lies in constructing a prediction model, which falls into two categories. The first category comprises classical time series models such as Autoregressive (AR) model, Moving Average (MA) model, Autoregressive Moving Average (ARMA) model, Autoregressive Integrated Moving Average (ARIMA) model, and Seasonal Autoregressive Integrated Moving Average (SARIMA) model [

The second category involves deep learning-based time series prediction methods [

Despite these advancements, limited research has systematically compared classical time series models and deep learning models for anomaly detection [

This section outlines a dynamic model-based approach for anomaly detection. The procedure involves several key steps. Initially, the data collected by the SHM system undergoes preprocessing, which includes format transformation and wavelet threshold denoising. After obtaining standardized and denoised data, a prediction model is constructed using both the ARIMA and LSTM methods. The fundamental concept behind the approach is for the model to capture the normal behavior of the time series. Consequently, if observations deviate significantly from the predictions, indicating a violation of time continuity, they are labeled as anomalies. In essence, if the prediction error falls outside a defined confidence interval, the observation is considered abnormal, leading to the issuance of hierarchical warning signals based on the degree of deviation. The flow chart of the procedure is depicted in

It is noteworthy that the method employs a rolling single-step prediction, continually advancing the time window while incorporating new observations into the model input. However, as time progresses, the model parameters need updating as the initial model fitted to previous data becomes less suitable for accurate predictions. Updating the model at every second is impractical due to time constraints. Thus, the paper introduces two metrics to govern when model updates occur: average deviation size and duration of model use. These metrics dictate parameter updates if the average error over a recent period exceeds an acceptable threshold or if the time interval since the last update surpasses a predefined maximum. This approach is based on the assumption that the presence of relatively few anomalies allows for their deviations to minimally impact subsequent model predictions. Additionally, incorporating time control acknowledges the evolving nature of the model over time.

Moreover, the confidence interval for prediction results should be tailored to the external environment and tunnel structure. Drawing inspiration from the PauTa Criterion or 3σ rule, the threshold can be determined based on historical data within a specific period. This statistical approach requires a sufficiently lengthy historical reference period to ensure a roughly normal distribution of sample data. However, the chosen period should not be excessively long, as a fixed threshold is only accurate under stable conditions. Based on the results of repeated attempts and considering that the dynamic prediction model updates this threshold to a reasonable range, this study adopts the statistical results of 1 h of SHM data to calculate the initial threshold value. However, it should be noted that this statistical-based threshold-setting approach may be influenced by data distribution skewness or overly stable conditions. Further studies and validations are needed to establish the optimal threshold-setting methods.

The ARIMA model is a widely used classical time series prediction model, typically denoted as ARIMA (p, d, q), where p signifies the autoregressive parameter reflecting lag observations, d is the number of times that a raw sequence is differenced, and q indicates the moving average parameter denoting the window length. The developments of static and dynamic ARIMA models in this study were the same as those reported by Chen et al. [

Although classical models like ARIMA excel in time series prediction, their emphasis on linear relationships can constrain the predicted value distribution. Long Short-Term Memory (LSTM) is a widely used artificial neural network for time series modeling. The LSTM model “learns” from historical monitoring sequences and aims to predict the subsequent sequence value. Inputs consist of sequence values within a defined time window, and the goal is to predict the value immediately following the window.

The fundamental structure of LSTM mirrors the Recurrent Neural Network (RNN). The LSTM's distinct feature lies in its capacity to consider not only the current input but also the outputs of previous time steps. This enables the network to retain its previous state and enhances its capacity to learn long-term dependencies within the sequence. The network’s structure, as depicted in

Unlike traditional RNNs, which struggle with long-term dependencies due to vanishing or exploding gradients, LSTM incorporates cell states and gate functions to handle such issues. The forget gate, input gate, and output gate govern the retention or removal of information and cell state modification. The mathematical formulations of these gates and the LSTM unit structure are provided in

(1) Forget gate:

(2) Input gate:

(3) Output gate:

(4) Cell state:

(5) Output value:

In detail, the sigmoid function is used as the activation function for the three gate functions, and the hyperbolic tangent function is used as the activation function for the cell state, which regulates the amount of information obtained. When moving forward from time

As depicted in

The Hong Kong-Zhuhai-Macao Bridge (HZMB), an expansive 55 km-long project spanning Lingdingyang Bay, comprises three integral components: the main project of the bridge, the island, and the undersea tunnel. The information on the undersea immersed tunnel can be obtained from the literature [

The standard tunnel element is 180 m long, consisting of eight segments each with a length of 22.5 m. The cross-sectional dimensions of the immersed tunnel are shown in

An SHM system has been implemented on the HZMB. The SHM system acquires five types of monitoring data: ground motion, joint deformation, concrete strain, temperature, and humidity, as shown in

Monitoring item | Data | Sensors | Installation location |
---|---|---|---|

Structural responses | Ground motion | 3D accelerometer | |

Strain of element | FBG strain sensor | ||

Joint deformation | Displacement meter | ||

Environmental loads | Temperature |
Thermometer |

Noise is inevitable in the SHM system because of environmental reasons or unstable installation. Such factors culminate in suboptimal data quality.

This study adopts wavelet threshold denoising method to eliminate noise. Specifically, the original data undergoes decomposition into five layers using Symlet 12 as the mother wavelet, for which the details were as obtained from the literature [

For the ARIMA model, the time series used should exhibit both stationarity and absence of white noise after differencing. Given ARIMA’s appropriateness for short-term predictions and the usual availability of historical modeling data containing fewer than 10 observations (resulting in p and q values predominantly below 10), this study adopts a dataset of 100 observations (equivalent to a 100-second time window) to develop the model. As an illustrative example, a time series of denoised concrete strain data from the HZMB immersed tunnel is employed, as depicted in

Initial series | First differentiated series | Second differentiated series | |
---|---|---|---|

Timing graph | |||

ACF plot | |||

PACF plot |

The Augmented Dickey-Fuller (ADF) test was employed to statistically evaluate the stationary of data, and the results are presented in

Series type | Test statistic | 5% critical value | Test results | |
---|---|---|---|---|

Initial series | 0.1798 | −2.8950 | 0.9711 | Nonstationary |

First differentiated series | −1.1433 | −2.8958 | 0.6975 | Nonstationary |

Second differentiated series | −3.6732 | −2.8962 | 0.0045 | Stationary |

Finally, the Ljung-Box test was utilized to discern the presence of white noise in the second differentiated series. The ^{−22}, signifying that the series did not exhibit white noise characteristics, warranting further analysis.

The process of model identification is to set the number of Autoregressive (AR) and Moving Average (MA) terms by an optimization calculation. This study employs the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to automatically select p and q, utilizing a 100-second time window of SHM data. As evident from the results presented in

Test data | AIC | BIC |
---|---|---|

Data [0:100] | 5, 2, 0 | 2, 2, 0 |

Data [10000:10100] | 5, 2, 0 | 5, 2, 0 |

Data [20000:20100] | 5, 2, 0 | 5, 2, 0 |

Data [30000:30100] | 5, 2, 0 | 5, 2, 0 |

Data [40000:40100] | 5, 2, 0 | 5, 2, 0 |

Data [50000:50100] | 2, 2, 0 | 2, 2, 0 |

Data [60000:60100] | 1, 2, 1 | 1, 2, 1 |

Data [70000:70100] | 5, 2, 0 | 2, 2, 0 |

Thus, the ARIMA-based model's formulation is as follows:

The parameter estimation method selected here is Maximum Likelihood Estimation (MLE) method.

The model's significance was assessed and illustrated in

In this study, a model error monitoring approach is adopted to determine when model updates are required. The system automatically updates parameters based on the latest 100 s of observations. Test results revealed that selecting an update threshold of 5 × 10^{−8} prompted the model to update 29 times in a day. The error sequence exhibited steady fluctuations throughout the day, as depicted in

When utilizing filtered data to construct an LSTM model, a noticeable lag in predicted values occurs when the data experiences rapid rises or falls (refer to

During prediction for time

There are two ways to solve this hysteresis. One is to apply a nonlinear function, such as the square, square root, and logarithm, to the sequence. The other is to differentiate the sequence until it is stationary. The first method requires nonlinear processing, thus changing the original sequence more radically. Moreover, the first method relies on the fact that the constructed nonlinear processing function is not recognized by the regressor of LSTM, but based on the experience of previous studies, this method has a high probability of failure and usually involves trying many different nonlinear processing functions [

Due to the iterative optimization of network parameters based on training set errors, model errors within the training set typically appear lower than actual errors. Consequently, sample data is divided into a training set and a test set. Striking a balance between training duration and model accuracy, this paper assigned 5/6 of the data instances to the training set and reserved the remaining 1/6 for the testing set. With a window width of 3600 s, the initial 3000 s were allocated for the training set, while the final 600 s formed the test set. The partitioning outcomes are depicted in

Training samples should be transferred to a standard form so that they can be learned by neural networks. According to

The main hyperparameters of LSTM include the number of hidden layers, the number of units at each layer, time window length, batch size, and number of epochs. To reduce the time of tuning, the model structure was set as a two-layer LSTM network with a batch size of 50 and an epoch number of 50. A discrete grid was set for the remaining hyperparameters, and a grid search was conducted to find the satisfying combination of hyperparameter values. Specifically, each grid value combination was used to train the network. The optimal performance combination was selected by evaluating the Mean Squared Error (MSE) of the model on the test set.

Because the weights are randomly initialized, the LSTM is unstable, meaning that the model's outcome varies even when the training data remains unchanged. To obtain a reliable model performance, each hyperparameter combination undergoes training ten times repeatedly, and the performance metrics are calculated by the overall mean of the absolute error on the test set.

Hyperparameter | Grid range | Value of the optimal combination |
---|---|---|

Units in layer 1 | 8, 16, 32 | 16 |

Units in layer 2 | 8, 16, 32 | 8 |

Time window length | 30, 60, 90 | 60 |

The output dimension of the second LSTM layer is the same as the dimension of the units in this layer, while the output of the model needs to be consistent with the dimension of label

After standardizing training data and selecting hyperparameters, the parameters of the model can be trained iteratively. The adaptive moment estimation algorithm serves as the optimizer, dynamically setting the learning rate by assessing gradients’ first and second moments. The Mean Squared Error (MSE) was selected as the loss function. It is one of the most commonly used loss functions in machine learning and gives a high penalty for outliers in sequences.

To avoid the influence of randomness on the model, the network was repeatedly trained 50 times, and the distribution of MSE on the test set was recorded. The distribution was skewed to the right, as shown in ^{−8}. If the error exceeds this threshold, random weights are reset, and the model undergoes retraining until it satisfies the error criteria.

Parallel to the ARIMA model, this section delves into the dynamic modeling approach for LSTM. Based on test results, the LSTM model maintained robust prediction performance for at least an hour after fitting. Rebuilding the LSTM is time-intensive due to the model’s parameter complexity and the iterative training required to mitigate randomness effects. Thus, the model’s update interval was set to one hour. Precisely, adhering to the identical LSTM network framework and hyperparameters, the LSTM model is retrained every hour using data from the preceding hour. The training set to test set ratio remains at 5:1. Through this approach, the model updates with data from the previous hour, inheriting the prior hour's initial parameters to expedite model training. Dynamic modeling maintains the maximum allowable error standard established in the static model, indicating the training loop halts only when the model's MSE falls below 2 × 10^{−8}.

Similar to the ARIMA model, the dynamic LSTM modeling approach is discussed in this section. Based on test results, the LSTM model maintained robust prediction performance for at least 1 h after fitting. Rebuilding the LSTM is time-intensive due to the model’s parameter complexity and the iterative training required to mitigate randomness effects. Thus, the model’s update interval was set to 1 h. Specifically, by keeping the LSTM network framework structure and hyperparameters unchanged, the LSTM model would be retrained every hour based on the data of the previous hour, with a training set and test set in the ratio of 5:1. Through this approach, the model was updated with data from the previous hour, inheriting the prior hour's initial parameters to expedite model training. Dynamic modeling was set to continue to use the maximum allowable error set in the static model, which means that the training loop would stop only if the MSE of the model fell below 2 × 10^{−8}.

Due to the large amount and the similarities among SHM data, other SHM data (such as ground motion and joint deformation) were not analyzed for validation in this study. The temperature and humidity, which are categorized as environmental loads, are strongly influenced by natural conditions and need to be analyzed separately. Therefore, a two-day concrete strain data sample was introduced for model training of the LSTM network.

The prediction error for a 1 h duration, which represents the discrepancy between the ARIMA model's one-step prediction and the actual observation, was calculated. As depicted in

Given that standard deviation gauges data variance, the previous hour’s data standard deviation was employed to gauge the permissible range of error fluctuations. By adjusting the coefficient of standard deviation, the confidence interval of different severities was set to realize the hierarchical warning. Utilizing thresholds set at 5.5, 6.5, and 7.5 standard deviations, outliers could be identified, as illustrated in

The anomaly detection mechanism employed by the dynamic LSTM model closely resembles that of ARIMA.

Thresholds of 5.5, 7, and 8 standard deviations were employed to identify outliers, as highlighted in

Considering the sequence requirements, the ARIMA model necessitates a stationary nonwhite noise sequence for effective modeling. In contrast, LSTM exhibits wider applicability due to its non-restrictive nature. LSTM merely demands the partitioning of original data into training samples and labels, thereby enabling broader adaptability compared to the ARIMA model.

Sample size plays a pivotal role in modeling. ARIMA requires a substantial sample size for statistical inference, with studies indicating a minimum requirement of at least 50 historical data points for acceptable results [

In terms of modeling speed, LSTM involves more parameters and intricate structures than ARIMA, leading to longer iterative calculations for parameter adjustments. Experimental evidence indicates that static ARIMA modeling can be completed in a mere 0.55 s, whereas static LSTM modeling takes approximately 131.7 s. Hence, ARIMA significantly surpasses LSTM in modeling speed. Both models ran on the same laptop with the following hardware information: an Intel Core i7-10750H processor with 16 GB of RAM, 1TB SSD hard disk, graphics card NVIDIA GeForce RTX 2060 Max-Q.

Regarding model stability, ARIMA operates deterministically; given data and hyperparameters, estimated model parameters are deterministic. In contrast, LSTM's modeling process is influenced by random factors like initial weight settings and batch selection, contributing to variable model outcomes across runs. To ensure accuracy, repeated modeling and setting of maximum acceptable errors are integrated into LSTM training. Consequently, ARIMA results tend to exhibit greater model stability than LSTM results.

For a fair comparison, it is necessary to establish the anomaly detection criteria.

LSTM | ARIMA | ||||
---|---|---|---|---|---|

Std. coefficient | Number of anomalies | Proportion of anomalies | Std. coefficient | Number of anomalies | Proportion of anomalies |

5 | 108 | 0.136% | 5 | 151 | 0.183% |

5.5 | 78 | 0.098% | 5.5 | 93 | 0.112% |

6 | 58 | 0.073% | 6 | 60 | 0.073% |

6.5 | 46 | 0.058% | 6.5 | 39 | 0.047% |

7 | 34 | 0.043% | 7 | 27 | 0.033% |

7.5 | 26 | 0.033% | 7.5 | 17 | 0.021% |

8 | 17 | 0.021% | 8 | 10 | 0.012% |

Hierarchical warnings are formulated based on anomaly proportions, with corresponding thresholds established, as indicated in

Proportion of anomalies | Std. coefficient of ARIMA | Std. coefficient of LSTM | Warning level | Colors |
---|---|---|---|---|

0.1% | 5.5 | 5.5 | Third-level | Yellow |

0.05% | 6.5 | 7 | Second-level | Orange |

0.02% | 7.5 | 8 | First-level | Red |

This section delves into the distribution and specifics of anomalies identified by both models across first to third-level thresholds. To provide a clearer perspective, anomalies from both methods are plotted on the same graph.

1) First-Level Warning

First-level warnings have the largest standard deviation coefficient, the strictest identification standard, and the fewest outliers identified. These require immediate attention and emergency measures from operational personnel. As shown in

2) Second-Level Warning

Second-level warnings are issued when anomalies are detected according to moderate standards.

3) Third-Level Warning

With more liberal anomaly detection criteria, third-level warnings encompass numerous anomalies with relatively mild severity (

LSTM identified fewer anomalies in specific locations compared to ARIMA, attributed to divergent prediction data windows. For instance, ARIMA predicted using historical data from the last 5 s, while LSTM employed 100 s. Thus, if the value drops rapidly for 10 s at the same rate, ARIMA will not alarm for the last 5 s; LSTM, on the other hand, alerts all data within 10 s, being more sensitive to extended data changes.

Moreover, both models exhibited increased warnings between 40,000 and 50,000 s, corresponding to the day’s data peak. It was supposed that before the data trend changed, other hidden features, such as amplitude and frequency, fluctuated in advance. The LSTM model could detect such changes and provide an early warning.

Based on the preceding analysis, the two models exhibited distinct characteristics, summarized in

Benchmark | ARIMA | LSTM |
---|---|---|

Model type | Classic time series model | Recurrent neural network |

Model Interpretation | White box | Black box |

Requirements for sequence | Stationary, nonwhite noise | No extra requirement |

Requirements for sample size | Large | Extremely large |

Number of hyperparameters | Relatively small | Large |

Model updating time | Short | Relatively long |

Model stability | Determinative model | Stochastic model |

Sensitivity to short-term anomalies | Very sensitive | Moderately sensitive |

Sensitivity to long-term anomalies | Inapplicable | Sensitive |

Warning lead time | Moderate performance | Good performance |

In conclusion, LSTM has fewer sequence constraints during modeling, but necessitates a larger sample size and presents a more intricate, less interpretable model. While ARIMA is adept at detecting short-term sequence fluctuations, its ability to detect medium- to long-term anomalies is limited. Moreover, ARIMA tends to issue warnings post-anomaly occurrence, leading to unsatisfied timing of early warning. On the contrary, LSTM demonstrates greater proficiency in predicting long-term sequence trends and delivering better performance of early warnings. One point to mention is that there is a difference in data applicability between ARIMA and LSTM, with ARIMA being stricter on the data and requiring a series of statistical testing and validation, as demonstrated in

For a tunnel SHM system, a large data set is ready for deep learning, so the learning potential of the LSTM model can be fully utilized. Generally, LSTM yields more accurate anomaly detection outcomes than ARIMA. However, ARIMA’s advantage in monitoring short-term sequences should not be disregarded. Therefore, combining these methods effectively—using LSTM as the primary method for long-term trend monitoring and early warning, while employing ARIMA as a supplementary tool with stringent threshold criteria for prompt short-term anomaly identification—seems promising for future monitoring system designs.

This study presented a hierarchical model-based approach for anomaly detection and evaluated it by a comparative analysis using SHM data from the HZMB immersed tunnel. The conclusions are as follows:

The concrete strain data of immersed tunnel elements were used in this paper, both ARIMA and LSTM could realize the dynamic model-based approach for anomaly detection. The model structure of ARIMA was ARIMA (5, 2, 0), and its modeling time was shorter, which took only 0.55 s; LSTM consisted of 1961 parameters, which needed to spend 131.7 s for modeling. However, ARIMA was slightly weaker than LSTM in prediction accuracy, thus using different criteria for updating the rolling dynamic model (5 ^{−8} for ARIMA and 2 ^{−8} for LSTM). Therefore, in terms of modeling features, ARIMA has better real-time performance and LSTM has better prediction accuracy.

Dynamic model-based approach uses specific coefficients multiple standard deviation as an outlier screening criterion. ARIMA-based and LSTM-based model use very similar coefficients. For the first-level warning, which is the strictest, the coefficients of the two were roughly in the range of 7.5 to 8. For the second-level warning, the coefficients of the two were roughly in the range of 6.5 to 7. For the third-level warning, the coefficients of the two were both 5.5. This suggests that there is actually little difference between the two in terms of their ability to identify outlier data.

In terms of data requirements, the ARIMA-based model requires stationary, nonwhite noise sequences, while the LSTM-based model has no additional sequence requirements. The comparative analysis of the two models indicated that ARIMA was highly sensitive to short-term anomalies, whereas LSTM was sensitive to long-term anomalies, leading to the phenomenon that LSTM performed better in early warnings. Therefore, it suggests combining LSTM for long-term trend monitoring and early warning with ARIMA as a supplementary tool for swift short-term anomaly identification.

We acknowledge the support given by the Hong Kong-Zhuhai-Macao Bridge Authority.

This work was supported by the Research and Development Center of Transport Industry of New Generation of Artificial Intelligence Technology (Grant No. 202202H), the National Key R&D Program of China (Grant No. 2019YFB1600702), and the National Natural Science Foundation of China (Grant Nos. 51978600 & 51808336).

The authors confirm their contributions to the paper as follows: study conception and design: Q. Ai, Q. Lang; data collection: X. Jiang, Q. Jing; analysis and interpretation of results: Q. Ai, H. Tian, H. Wang, X. Huang, X. Jiang, Q. Jing; draft manuscript preparation: Q. Ai, Q. Lang. All authors reviewed the results and approved the final version of the manuscript.

Data is available on request to the authors.

The authors declare that they have no conflicts of interest to report regarding the present study.