Near-fault impulsive ground-shaking is highly destructive to engineering structures, so its accurate identification ground-shaking is a top priority in the engineering field. However, due to the lack of a comprehensive consideration of the ground-shaking characteristics in traditional methods, the generalization and accuracy of the identification process are low. To address these problems, an impulsive ground-shaking identification method combined with deep learning named PCA-LSTM is proposed. Firstly, ground-shaking characteristics were analyzed and ground-shaking the data was annotated using Baker’s method. Secondly, the Principal Component Analysis (PCA) method was used to extract the most relevant features related to impulsive ground-shaking. Thirdly, a Long Short-Term Memory network (LSTM) was constructed, and the extracted features were used as the input for training. Finally, the identification results for the Artificial Neural Network (ANN), Convolutional Neural Network (CNN), LSTM, and PCA-LSTM models were compared and analyzed. The experimental results showed that the proposed method improved the accuracy of pulsed ground-shaking identification by >8.358% and identification speed by >26.168%, compared to other benchmark models ground-shaking.

In recent times, several studies have shown that impulsive ground-shaking has a particularly destructive effect on buildings and structures [

Numerous studies have proposed a series of different identification methods for impulsive ground-shaking, which essentially solve the problems due to manual subjective identification. Baker et al. [

With the rapid development of key technologies in the field of computing, deep learning (DL), an important branch of artificial intelligence, originated from the research and development of Artificial Neural Network (ANN) [

In recent years, DL has become an effective mathematical analysis tool, gradually applied to various types of geophysical studies [

Compared to other DL methods, the Long Short-Term Memory network (LSTM) [

The remainder of this paper is organized as follows:

As shown in

The ground-shaking data used in this paper was obtained from the earthquake database provided by the United States Geological Survey (USGS). The Chi-Chi earthquake of 1999 [

The velocity impulse ground-shaking data is often characterized by large amplitudes, long characteristic periods, large instantaneous cumulative energies, and strong non-stationarity [

After the screening, the acceleration data was removed from the dataset and all the subsequent studies were based on the remaining ground-shaking velocity datasets (a total of 356 data points).

The velocity impulse type ground-shaking usually has the following characteristics: 1) energy concentration: releases a large amount of energy in a short period of time; 2) sudden and unpredictable increase/decrease in a short period of time; and 3) large peak velocity to peak acceleration ratio. These velocity impulse characteristics were combined with a widely used energy-based pulse identification method. The velocity and acceleration information in the raw data for labeling are shown in

In the raw data,

1) Peak ground velocity:

2) Pulse indicator:

3) The timing of the appearance of the large velocity pulse should match

Because of its complexity, the entire original ground-shaking data cannot be input into the ANN for training, as it will decrease the training efficiency. Accordingly, processing the original data before training becomes imperative. Utilizing a selection of key eigenvalues to characterize ground-shaking and relying on a limited sample effectively represents the entire population.

Comprehensively characterizing the ground-shaking data involves the consideration of several key eigenvalues, including the earthquake duration (T), PGV, time required for seismic intensity to reach the 5% (T5), 75% (T75), and 95% (T95) peaks, and the duration for the seismic intensity to transition from the 5% to 75% (D5_75) and 75% to 95% (D5_95) peaks. Due to the connection between the physical quantities, the velocity pulse data can reflect the characteristics of the acceleration pulse data to a certain extent. The introduction of the acceleration data can lead to a significant increase in computational complexity. Therefore, considering various factors, only the velocity pulse data was used as the input for this study. Additionally, for the purpose of facilitating both training and testing, the ground-shaking data was appropriately labeled (

No. | T | PGV | T5 | T75 | T95 | D5_75 | D5_95 | Label |
---|---|---|---|---|---|---|---|---|

1 | 149.98 | 40.4224 | 6.62 | 11.3 | 16.32 | 4.68 | 9.70 | 1 |

2 | 149.98 | 91.6714 | 7.58 | 11.78 | 15.96 | 4.20 | 8.38 | 1 |

3 | 149.98 | 76.1410 | 7.58 | 11.92 | 17.12 | 4.34 | 9.54 | 1 |

4 | 149.98 | 5.0782 | 9.76 | 22.00 | 34.60 | 12.24 | 24.84 | 0 |

5 | 149.98 | 5.2000 | 18.32 | 24.38 | 31.92 | 6.06 | 13.60 | 0 |

6 | 140.77 | 8.1114 | 44.8 | 62.82 | 78.75 | 18.02 | 33.95 | 0 |

7 | 152.99 | 19.1733 | 17.87 | 28.09 | 50.10 | 10.22 | 32.23 | 0 |

... | ... | ... | ... | ... | ... | ... | ... | ... |

PCA is one of the most commonly used methods for complex input data processing and removing data with low correlations. The low-dimensional dataset output from PCA was mapped from the original high-dimensional dataset. The processed data reflected the key features of the original dataset to the greatest possible extent, reducing the risk of overfitting. Simultaneously, the PCA method downsizes the original data to reduce the volume and vastly improves the ANN training speed. In this study, due to the complexity of the ground-shaking data and the strong correlation between the different parameters, it is necessary to use PCA to reduce the number of parameters and maximize the retention of the key features of the original data to increase model training speed and identification efficiency.

The feature extraction steps for the ground-shaking data by PCA included: 1) data normalization; 2) covariance matrix calculation; 3) eigenvalue and eigenvector calculation; 4) eigenvalue ranking and selection; 5) dimensionality reduction and feature selection. Each of these steps is explained below in detail.

Data standardization refers to the process of transforming data with different units and scales to scale-free data for comparability. For machine learning models, the scale difference will have a large impact on the model accuracy. Standardization makes the model more stable and accurate in training and prediction. In this study, the different entries of the ground-shaking data were corrected with large variations in parameter scales and distribution intervals by standardized transformations, to make values of the different parameters fall in an interval with small differences. The specific process is as follows.

Let

Subsequently, a standardization transformation is applied to all elements of the matrix:
_{ij}), respectively.

Firstly, the covariance matrix is calculated for the normalized data, and the eigenvalues are obtained using the decomposition method. Then, the eigenvalues are sorted and the largest

If the ground-shaking data in the matrix is represented by

A linear transformation can usually be fully described by the eigenvalues and eigenvectors. Therefore, in the PCA process, it is necessary to use the eigen-equations and eigenvectors to transform the vector space formed by the original data. In this method, the non-negative eigenvalues,

To indicate the reflection of the principal components on the original dataset information, the concepts of principal component and cumulative contribution ratios can be introduced in the selection process. They indicate the degree of expression of the original dataset information by using single and multiple principal components, respectively. The contribution ratio of the

By analogy, the weight of the sum of the eigenvalues of the first

Comprehensively considering the accuracy and realistic conditions, when

Let the required principal component vector be

In this study, to identify impulsive ground-shaking, LSTM were used which are highly effective in processing time-series data.

LSTM is based on the development of recurrent neural networks (RNN). RNN is primarily used for sequence-type data. All the neurons in the hidden layer are connected in a chain structure, which is capable of realizing cyclic transmission of data in the network and memorizing the input data.

The horizontal structure of LSTM also forms a chain composed of repeated cellular units (

The principle of the LSTM model for impulsive ground-shaking prediction mainly includes the following processes:

Input ground-shaking data: The ground-shaking data at time

Forgetting gate calculation: The hidden layer state,

Input gate calculation: The input gate determines the amount of information to be added to the cell state. It contains two layers: sigmoid and hyperbolic tangent (

Update cell state: The cell state,

Output gate calculation: The hidden layer state,

The predicted output of the LSTM model is then compared to the true results and the error between the two is calculated and back-propagated to update the model parameters. The error backpropagation is performed in the opposite direction to the forward propagation, i.e., layer by layer backpropagation from the output to the input layer. At each layer, the error backpropagation calculates the error gradient and accordingly updates the weights and thresholds. In this manner, the NN can gradually reduce the error and get progressively closer to the desired output. During the backpropagation process, by lending the output of the network to the previously delineated calibration result, the error of the network can be quantized using the cross-entropy loss function. The loss calculation process for a certain result can be expressed as

After deriving the quantized error, the NN (i) back-propagates layer by layer according to the error signal along the direction of the fastest descent of the relative error sum of squares, and (ii) calculates the adjustment amount and updates the weights and thresholds of each neuron, to allow the network outputs to gradually approximate the real value.

To validate the effectiveness of the proposed PCA-LSTM model, validation experiments based on the impulsive ground-shaking dataset constructed in

The PCA method can map high dimensional data to a lower dimension, and in the field of pulsed ground-shaking recognition, it can extract the most important features of impulsive ground-shaking, reduce the redundancy of data, highlight the key features, and help the LSTM model to better learn and understand the impulsive ground-shaking. The results after PCA of the ground-shaking data are shown in

Ground-shaking features | Variance | Singular value | Rank |
---|---|---|---|

T | 0.1669 | 14.503 | 2 |

PGV | 0.7049 | 29.803 | 1 |

T5 | 0.0668 | 9.1239 | 3 |

T75 | 0.0159 | 4.4858 | 5 |

T95 | 0.0461 | 7.6175 | 4 |

D5_75 | 0 | 0 | 6 |

D5_95 | 0 | 0 | 7 |

Finally, the ranking of the ground-shaking features was accomplished by PCA, and based on the calculated variance and singularity values, the top five ranked features, namely PGV, T, T5, T95, and T75, were selected as the five eigenvalues, which were used as the inputs for the subsequent LSTM model training.

In order to guarantee optimal model performance, attention should be given to enhancing the model’s generalizability and mitigating overfitting during the training process. The dataset based on PCA to extract the main features was randomly divided into the training, validation, and test sets in a ratio of 7:1:2. Among these, the training set was used to train the LSTM model and update the model parameters for the model to determine whether it contains impulsive ground-shaking data points based on the input data; the validation set was used to evaluate the model accuracy and generalization performance during the training process; and the test set was used to evaluate the accuracy and generalizability of the completed training model. Normally, the validation and test sets are not involved in model updation.

When training the LSTM model, the parameters need to be defined, including the number of model training rounds, the choice of optimizer, batch size, and so on (

Symbol | Define | Value |
---|---|---|

epoch | number of training | 100 |

lr | learning rate | 0.05 |

batch_size | size of batch | 20 |

Loss | loss function | Cross entropy loss |

optimizer | optimizer | Stochastic gradient descent |

Both loss metrics of the LSTM model gradually decreased and stabilized during the training process. After ~50 iterations, both training and validation losses plateaued. Eventually, the training loss stabilized at 0.010 and the validation loss stabilized at ~0.012. It can be seen that the model not only obtains good performance on the training set, but also has a good fitting effect on the validation set, verifying the generalizability of the model.

To further verify the PCA-LSTM model performance, the evaluation system for the impulsive ground-shaking identification was constructed, and the model was comprehensively evaluated in terms of two important factors, i.e., accuracy rate

To further illustrate the advantages of the proposed PCA-LSTM model, it was trained using the same training set and parameters as the ANN, CNN, and LSTM models. Comparative analyses were then performed for the same test set and the evaluation metrics described in

Model | Accuracy (%) | Speed (s) | |
---|---|---|---|

Pulsed | Non pulsed | ||

ANN | 85.128 | 87.536 | 0.0432 |

CNN | 89.642 | 89.971 | 0.0826 |

LSTM | 96.675 | 97.011 | 0.0214 |

PCA-LSTM | 96.670 | 97.001 | 0.0158 |

Currently, due to limitations in resources, energy, and external conditions, certain shortcomings in the presented work remain. The following aspects will need to be further investigated:

Ground-shaking signals exhibit various features across different frequency ranges. Future research can explore effective methods for integrating these multi-scale features to harness both global and local characteristics of the ground-shaking signals.

Subsequent research efforts may consider incorporating data augmentation techniques, model integration methods, and anomaly detection algorithms to enhance the robustness and reliability of the model.

To accurately and efficiently recognize impulsive ground-shaking and reduce the damage to engineering structures, a combined DL recognition model, named PCA-LSTM, was introduced in this paper. The detailed information of the model is presented as follows:

The model construction was mainly based on the analysis and identification of impulsive ground-shaking features and the annotation of the ground-shaking data using the traditional method proposed by Baker [

Training and testing of the model: After constructing the ground-shaking dataset, the most relevant ground-shaking features were extracted using the PCA method. Subsequently, the ground-shaking dataset was updated and only the extracted feature values were retained. This reduced data redundancy and improved the efficiency of model training and identification. Finally, the reconstructed dataset was divided, trained, and analyzed for comparison.

Advantages: Compared to other benchmark models, the proposed PCA-LSTM model showed excellent performance in terms of identification accuracy and speed. It greatly improved the accuracy and speed of pulsed ground-shaking identification. In addition, the model can be applied to solve practical engineering problems. It is of great significance for seismic monitoring and structural engineering design, thus, improving our ability to mitigate seismic hazards to a certain extent and safeguarding lives and property.

None.

The author received no specific funding for this study.

The author confirms contribution to the paper as follows: study conception and design: Yizhao Wang; data collection: Yizhao Wang; analysis and interpretation of results: Yizhao Wang; draft manuscript preparation: Yizhao Wang. All authors reviewed the results and approved the final version of the manuscript.

The ground-shaking data used in this paper was obtained from the earthquake database provided by the United States Geological Survey (USGS).

The author declare that he has no conflicts of interest to report regarding the present study.

_{w}6.4 Hualien, Taiwan, earthquake

_{w}7.9 Wenchuan earthquake