As massive underground projects have become popular in dense urban cities, a problem has arisen: which model predicts the best for Tunnel Boring Machine (TBM) performance in these tunneling projects? However, performance level of TBMs in complex geological conditions is still a great challenge for practitioners and researchers. On the other hand, a reliable and accurate prediction of TBM performance is essential to planning an applicable tunnel construction schedule. The performance of TBM is very difficult to estimate due to various geotechnical and geological factors and machine specifications. The previously-proposed intelligent techniques in this field are mostly based on a single or base model with a low level of accuracy. Hence, this study aims to introduce a hybrid random forest (RF) technique optimized by global harmony search with generalized opposition-based learning (GOGHS) for forecasting TBM advance rate (AR). Optimizing the RF hyper-parameters in terms of, e.g., tree number and maximum tree depth is the main objective of using the GOGHS-RF model. In the modelling of this study, a comprehensive database with the most influential parameters on TBM together with TBM AR were used as input and output variables, respectively. To examine the capability and power of the GOGHS-RF model, three more hybrid models of particle swarm optimization-RF, genetic algorithm-RF and artificial bee colony-RF were also constructed to forecast TBM AR. Evaluation of the developed models was performed by calculating several performance indices, including determination coefficient (R^{2}), root-mean-square-error (RMSE), and mean-absolute-percentage-error (MAPE). The results showed that the GOGHS-RF is a more accurate technique for estimating TBM AR compared to the other applied models. The newly-developed GOGHS-RF model enjoyed R^{2} = 0.9937 and 0.9844, respectively, for train and test stages, which are higher than a pre-developed RF. Also, the importance of the input parameters was interpreted through the SHapley Additive exPlanations (SHAP) method, and it was found that thrust force per cutter is the most important variable on TBM AR. The GOGHS-RF model can be used in mechanized tunnel projects for predicting and checking performance.

The population is growing at a rapid pace, which necessitates countries setting the stage for development, particularly in sectors such as tunnel construction and underground spaces. In the process of constructing a tunnel, two methods are commonly used by engineers: tunnel boring machine (TBM) and drilling-blasting [

In recent decades, many projects involving tunnel construction have implemented TBMs. This excavation method is becoming increasingly popular, particularly in the case of projects within dense urban regions with low excavation depth and high levels of risk induced by the external loading of neighboring structures [

The prediction of TBM performance such as penetration rate and advance rate (AR) has been considered a vital task for many researchers. Some researchers proposed empirical equations for the aforementioned task [

With the rapid advancement of artificial intelligence (AI) technology, an increasing number of machine learning (ML) techniques have been introduced into various engineering applications [^{2}, of 0.850) for the TBM penetration rate. In addition, the group method of data handling was used by Koopialipoor et al. [^{2} = 0.934 and root-mean-square-error (RMSE = 0.032).

Furthermore, for the prediction of the TBM performance, Li et al. [

The literature presented above showed the implementation of AI and ML techniques in the prediction of geotechnical problems, especially in TBM projects for the prediction of TBM performance. However, many of the proposed models in this area are considered single or base intelligence techniques, and as a fact, the performance of base models can be improved using powerful optimization algorithms. According to the above reviews, this study aims to propose hybrid intelligent techniques where the base model is the random forest (RF). In this way, different heuristic algorithms, including the genetic algorithm (GA), artificial bee colony (ABC), particle swarm optimization (PSO), and global harmony search with generalized opposition-based learning (GOGHS), as powerful optimization techniques were selected. Therefore, four new models, i.e., GOGHS-RF, PSO-RF, GA-RF and ABC-RF are employed to predict the TBM AR in a variety of geological conditions. Then, the same models are evaluated and discussed to introduce the best RF-optimized technique in estimating TBM AR. These models are recognized as state-of-the-art intelligent control systems that can be effectively applied to tunnel construction projects.

Selangor is one of the most developed states in Malaysia and has the highest population density in the country. To this end, there is a large demand for water supply to support the residents in the area. The Pahang-Selangor Raw Water Transfer (PSRWT) project aims to divert water supply from Pahang to Selangor through the tunnel excavated using three TBMs. Of the total 44.6 km of tunnel distance, 39.4 km of tunnelling work has been excavated using the mentioned TBMs.

The PSRWT project has been undertaken in Peninsular Malaysia, located between Pahang and Selangor states. Pahang State is located to the east of Selangor State and has a lot of excess water resources in comparison with the state’s water demand. The objective of this project is to transfer 1890 million litres of water diverted from the Sematan River in Pahang to the South Klang Valley region in Selangor. The flow of the Sematan River is extracted to the reservoir by the pumping station next to the intake via a pipe to a connecting basin at the tunnel inlet. The connecting basin diverts the raw water to the outlet-connecting basin with the aid of gravity flow. Subsequently, the raw water will be transferred to the water treatment plant to purify it before it can be dispatched to the residents in the Klang Valley area.

Field observation was done for the PSRWT project in the middle of 2013 to assess the rock mass properties along the alignment of the tunnel. Rock mass classification is an essential parameter to be assessed during the preliminary design or planning of the project [

Input | Symbol | Average | Min | Max | Std. Dev |
---|---|---|---|---|---|

Rock quality designation (%) | RQD | 54.259 | 6.250 | 95.000 | 28.610 |

Rock mass rating | RMR | 72.894 | 44.000 | 95.000 | 16.101 |

Weathering zone | WZ | 1.699 | 1.000 | 3.000 | 0.693 |

Unconfined compressive strength (MPa) | UCS | 135.128 | 40.000 | 194.000 | 45.104 |

Brazilian tensile strength (MPa) | BTS | 10.321 | 4.690 | 15.680 | 4.066 |

Thrust force per cutter (kN) | TFC | 301.514 | 80.603 | 565.840 | 88.266 |

Revolution per minute (rev per min) | RPM | 8.827 | 4.040 | 11.950 | 2.314 |

Output | Symbol | Average | Min | Max | Std. Dev |

Advance rate (meter per hour) | AR | 1.083 | 0.017 | 5.000 | 0.663 |

In addition, the binary continuous distribution of the seven input variables through the TBM AR cut, and the analysis of their outliers are clearly visible in the multivariate box line plot displayed in

Harmony search (HS) is one of the meta-heuristic algorithms that simulates the procedure of harmony production [

where,

where, the lower and upper bounds of the search space are presented by LB_{j} and UB_{j}, respectively.

After initialization, HS will enter the optimization loop, which mainly includes three search rules: pitch adjustment, random sampling, and memory consideration. Based on these three rules, four additional parameters need to be defined in HS, i.e., the harmony memory considering rate (HMCR), the pitch adjust rate (PAR), the bandwidth (BW) and the maximal iteration number (T). The function of these three search rules is to generate a new harmony vector

In the traditional HS algorithm, the search direction is random and unpredictable. Although this feature makes it more difficult for this algorithm to miss the real optimal answer in the search space, the excessively random search direction will greatly reduce the convergence speed of the algorithm and make it unable to deal with complex problems, effectively. When the practical problem is complex enough, an improved HS algorithm, called novel global HS (NGHS) [

The improved harmony search algorithm, NGHS, can greatly increase the exploitation capability of the overall model to meet the needs of the mentioned problems. However, to further improve the global search ability of the model, the generalized opposition-based learning (GOBL) plan is introduced, which was developed by Wang et al. [

Step 1. According to the running program of NGHS, GOGHS will first generate a new harmony V.

Step 2. Based on the GOGHS strategy, the generalized opposition-based solution OV (OV = [ov1, ov2, …, ovD]) corresponding to the candidate solution V will be generated. The expression of ov_{j} is as follows:

where, ^{th} dimension are presented using

Once _{j} goes beyond the range defined by _{j} and _{j}, the value of _{j} will be expressed as follows:

Step 3. The third step is to evaluate

Step 4. When the search process goes to generation

The decision tree is a simple and widely used model that branches by impurity calculation and can be used to deal with classification and regression problems. The Random Forest (RF) model is an integrated predictive algorithm proposed by Breiman [

RF can improve the decision tree building and performance. For ordinary decision tree building, it selects an optimal feature among all sample features P on the node through the calculation to carry out the sub-tree partitioning. While RF is constructed by randomly selecting a part of the sample features n on the node (certainly, n is less than P) to construct the best partition. It can further enhance the generalization ability of the model and make it perform better. Finally, the computational results of the large number of decision trees constructed are comprehensively considered to obtain the final decision. The structural process of the RF model [

In this study, GOGHS was applied to the parameter optimization of the RF prediction model. PSO, GA, and ABC optimization approaches were also used for comparison purposes. The main process for using the mentioned approaches is as follows:

(i) Data division/preparation: In this stage, the entire dataset should be divided into model training data and model test data. The division ratio should be based on the suggestions from previous investigations.

(ii) Initialization parameters: Set the parameters of the optimization method.

(iii) Fitness evaluation and update parameters: Compute the fitness function and optimize its parameter values according to the fitness.

(iv) Status check: When the optimal shutdown requirements are met, the best parameters are obtained.

The study decided to use the relevant index determination coefficient (R^{2}), root-mean-square error (RMSE), and mean absolute percentage error (MAPE) to assess the performance of the proposed models [^{2} is a statistical measure that the regression prediction is close to the true data point. R^{2} = 1 means that the regression prediction fits the data perfectly. RMSE is a measure of the size of the data prediction error. MAPE is a prediction accuracy index and 0% means that the model is perfect. In the following equation, where _{i−m} is the measured AR value, _{i−p} is the predicted AR value,

The RF is highly effective for processing high-dimensional data and handling nonlinear problems. As each tree is constructed independently, RF is quite robust in identifying outliers and avoiding overfitting [

To develop a hybrid TBM AR prediction model based on RF, this study combined the improved GOGHS meta-heuristic optimization method with RF and performed corresponding AR prediction tests. Then, according to the Pareto principle, the AR database in this study was randomly divided into two data sets at a division ratio of 80%/20%. 80% of the data was used for the TBM AR prediction training, and 20% of the data was used for the AR prediction test. This model considers the previously mentioned seven influencing factors (UCS, RQD, RPM, RMR, BTS, WZ, and TFC) as input parameters, with AR serving as the output parameter. According to the RF tuning experience and literature theoretical reference, this study selected the number of trees (n_estimators), the maximum tree depth (max_depth) and the minimum number of samples of leaf nodes (min_samples_leaf) as the three main parameters of the model for training. To evaluate the performance of the AR prediction models, performance indicators, i.e., RMSE, R^{2}, and MAPE were used to evaluate the developed models.

As part of the model optimization, an exploration was conducted into the effect of population size, a crucial parameter in meta-heuristic algorithms [^{2} = 0.9937, RMSE = 0.0529, MAPE = 5.9810 for training, and R^{2} = 0.9844, RMSE = 0.081, MAPE = 11.8260 for testing).

GOGHS-RF | |||||||
---|---|---|---|---|---|---|---|

Training | |||||||

Swarm | R^{2} |
Score | RMSE | Score | MAPE | Score | Total |

20 | 0.9914 | 1 | 0.0619 | 1 | 7.4338 | 1 | 3 |

40 | 0.9932 | 3 | 0.0548 | 3 | 5.8140 | 5 | 11 |

80 | 0.9915 | 2 | 0.0614 | 2 | 7.0174 | 2 | 6 |

100 | 0.9937 | 5 | 0.0529 | 5 | 5.9810 | 4 | 14 |

200 | 0.9934 | 4 | 0.054 | 4 | 6.2803 | 3 | 11 |

Testing | |||||||

Swarm | R^{2} |
Score | RMSE | Score | MAPE | Score | Total |

20 | 0.9839 | 3 | 0.0822 | 3 | 11.8644 | 2 | 8 |

40 | 0.9833 | 2 | 0.0837 | 2 | 12.3044 | 1 | 5 |

80 | 0.9844 | 5 | 0.081 | 5 | 11.8122 | 4 | 14 |

100 | 0.9844 | 5 | 0.081 | 5 | 11.8260 | 3 | 13 |

200 | 0.9841 | 4 | 0.0816 | 4 | 11.6210 | 5 | 13 |

Building on the understanding of population size’s impact on the GOGHS-RF model, this research also developed three common RF-based meta-heuristic algorithm models for the same problem, namely GA-RF, ABC-RF, and PSO-RF. Then, the performance results of these optimized hybrid models and other single models were compared with the main predictive model in this study. It is worth noting that the modelling procedures of these optimization methods are not discussed in this article, and only their results are given. More details related to their modelling procedure and building are available in the original literatures. In all the above-mentioned hyper-parameter adjustment processes, a 5-fold cross-validation resampling technique was applied to increase model performance and reliability. The setting parameters of each optimization algorithm and their used hyper-parameter combinations for each model to predict TBM AR are listed in ^{2} = 0.9844, RMSE = 0.081 and MAPE = 11.8260) under the best parameter combination (n_estimators = 45, max_depth = 11 and min_samples_leaf = 2). In addition, the GA-RF model got the corresponding test results (R^{2} = 0.9777, RMSE = 0.0967 and MAPE = 13.9562) under the parameter combination (n_estimators = 356, max_depth = 23 and min_samples_leaf = 12); the PSO-RF model got the best test performance (R^{2} = 0.978, RMSE = 0.0961 and MAPE = 14.9889) under the parameter combination (n_estimators = 356, max_depth = 5 and min_samples_leaf = 1); and the ABC-RF model got the best test results (R^{2} = 0.9799, RMSE = 0.0919 and MAPE = 13.2602) according to the parameter combination (n_estimators = 155, max_depth = 13 and min_samples_leaf = 9).

Algorithm | Parameters | Value | Optimal parameters |
---|---|---|---|

n_estimators = 45 | |||

GOGHS | Genetic mutation probability | 0.1 | max_depth = 11 |

min_samples_leaf = 2 | |||

Crossover probability | 0.5 | n_estimators = 356 | |

GA | Mutation probability | 0.25 | max_depth = 23 |

Selection probability | 0.75 | min_samples_leaf = 12 | |

Cognitive coefficient1 | 1.7 | n_estimators = 356 | |

PSO | Cognitive coefficient2 | 1.7 | max_depth = 5 |

Inertia weight | 0.7 | min_samples_leaf = 1 | |

n_estimators = 155 | |||

ABC | Number of trial limits | 10 | max_depth = 13 |

min_samples_leaf = 9 |

Following the optimization outcomes of these models detailed in

Training | |||||||
---|---|---|---|---|---|---|---|

Model | R^{2} |
Score | RMSE | Score | MAPE | Score | Total |

GOGHS-RF | 0.9937 | 4 | 0.0529 | 4 | 5.9810 | 4 | 12 |

GA-RF | 0.9621 | 1 | 0.1298 | 1 | 14.0811 | 2 | 4 |

PSO-RF | 0.975 | 3 | 0.1053 | 3 | 15.0945 | 1 | 7 |

ABC-RF | 0.9676 | 2 | 0.1199 | 2 | 12.9691 | 3 | 7 |

Testing | |||||||

Model | R^{2} |
Score | RMSE | Score | MAPE | Score | Total |

GOGHS-RF | 0.9844 | 4 | 0.081 | 4 | 11.8260 | 4 | 12 |

GA-RF | 0.9777 | 1 | 0.0967 | 1 | 13.9562 | 2 | 4 |

PSO-RF | 0.978 | 2 | 0.0961 | 2 | 14.9889 | 1 | 5 |

ABC-RF | 0.9799 | 3 | 0.0919 | 3 | 13.2602 | 3 | 9 |

Furthermore, to evaluate the comprehensive performance of these hybrid models, scoring was done for performance on the training and test sets, and these scores are displayed in

Building on these comprehensive performance results, the scatter plot analyses presented were further conducted in ^{2}: 0.9937) and the test set (R^{2}: 0.9844) exhibit a high coefficient of determination; the RMSE value is very low (training set: 0.0529 and test set: 0.081). Additionally, the MAPE values are also very low for the GOGHS-RF model (training set: 5.9810 and test set: 11.8260). The performance comparison of the GOGHS-RF model with the other hybrid models indicates that the model is well-trained and effectively avoids under-fitting and over-fitting. The results show that the R^{2} values of the four hybrid models are generally above 0.96, representing a high level of prediction accuracy. Furthermore, the scatter plot analysis included two equations, one linear and one nonlinear, proposed by Armaghani et al. [

In view of the above discussion, this part vividly illustrates the effectiveness of these optimization methods for RF optimization from various perspectives in

To effectively compare the simulation effects of different models, relying solely on scatter diagrams may not be intuitive enough. The Taylor diagram is a way to intuitively compare models [

In intelligence and simulation work, it is very important to maintain and increase the performance prediction if a new model is proposed. In addition, it is also important to decrease the number of features or inputs compared to previous related works. Considering the previous studies similar to this one, it can be seen that the model presented in this study is more accurate and applicable. For example, the same data was used by Zhou et al. [^{2} = 0.962 and 0.972, and RMSE = 0.127 and 0.116 for train and test stages, respectively. In contrast, this study developed the GOGHS-RF model, which achieved R^{2} = 0.9937 and 0.9844 for the train and test stages, respectively. In another work, Armaghani et al. [^{2} = 0.958 and 0.961 for the train and test stages, respectively. The advantages of the present study lie in: (1) performance prediction, (2) the number of factors used as inputs. It is obvious that the developed GOGHS-RF model predicts TBM AR more accurately and better than the PSO-ANN model that Armaghani et al. [^{2} = 0.897 and 0.916 for train and test stages, respectively) compared to this study. Therefore, it can be concluded that this study and its newly developed model, i.e., GOGHS-RF, make a significant contribution to the literature and can be introduced as a practical and accurate technique in mechanized tunnel excavation.

The predictive accuracy of the GOGHS-RF model depends on the input variables used (RQD, UCS, RMR, BTS, WZ, TFC, and RPM). It is meaningful to identify relatively valid and relatively invalid parameters [_{i}, _{s} represents the vector of input features in set

The results of the significance of the inducing factors were obtained according to the Shapley value (

Following the evaluation of the overarching significance of the influencing factors using Shapley values, the analysis now shifts its lens to a finer scale. The investigation is further narrowed down to an individual test sample to probe the singular effects of these parameters on the prediction. This section performed a detailed analysis of the fifth test sample, and

To gain a deeper understanding of the interaction between variables, the most significant feature, i.e., TFC, can be plotted based on the SHAP values. According to

Previous studies have observed that the RF model has been applied as a standalone predictive technique in only a few instances of TBM construction research. Therefore, this study aimed to develop new metaheuristic optimization-based hybrid RF models to predict TBM performance (i.e., AR). Following the construction of the model and multiple tests, the GOGHS algorithm—with its enhanced global search capability based on HS improvement—was identified as the optimal algorithm for adjusting the hyperparameters of RF. The predictive capacity of the GOGHS-RF model was systematically verified using a variety of evaluation indicators and compared with three other metaheuristic-based models, i.e., GA-RF, PSO-RF, and ABC-RF. This comparison underscored the distinct advantages of the GOGHS algorithm over the other three tuning strategies.

In this context, the study further examined the performance of the GOGHS-RF model. Performance indexes (R^{2}, RMSE, and MAPE) and a comprehensive ranking system were utilized to assess modeling capacity. GOGHS-RF achieved an R^{2} of 0.9937 on the training set and 0.9844 on the test set in the model comparison. Additionally, the index values for RMSE and MAPE on the training and test sets were 0.0529 (0.081) and 5.9810 (11.8260), respectively. The results indicated an excellent regression effect, and the prediction error of GOGHS-RF on the entire TBM dataset was extremely small, demonstrating its outstanding learning performance. The GOGHS-RF model obtained the most comprehensive prediction performance among all proposed RF-based models based on these metric values. This highlights the strong merit-seeking capability of the GOGHS method in this study and its high potential for engineering applications. Moreover, a Taylor diagram was used to compare the performances of CatBoost, ANN, SVM, AdaBoost, and the hybrid RF models, with the results also showcasing the excellent performance of GOGHS-RF. Lastly, a Shapley-value was employed to study and analyze the relative importance of the influencing variables of TBM AR. Based on the optimal RF model, the input parameter importance was scored and ranked, with the results revealing that TFC had the highest importance score of 0.4245. TFC was found to be the most crucial factor affecting TBM AR, aligning with previous findings and the relevance of the input parameter statistics.

In the end, intelligent hybrid RF-based models have shown significant promise in predicting TBM excavation performance. Among them, the proposed GOGHS-RF model exhibited satisfactory data learning capability and predictive performance. Despite limitations and shortcomings, such as poor data quality and an insufficient sample size, the approach holds potential for broader application in rock mechanics and engineering geology. Future studies could address these limitations by implementing data cleaning techniques, feature selection, and incorporating more available data to enhance model performance.

The authors express their appreciation to the National Natural Science Foundation of China, and the Distinguished Youth Science Foundation of Hunan Province of China.

This research was funded by the National Natural Science Foundation of China (Grant 42177164), and the Distinguished Youth Science Foundation of Hunan Province of China (2022JJ10073).

Study conception and design: Yingui Qiu, Jian Zhou; data collection: Danial Jahed Armaghani; methodology: Yingui Qiu, Shuai Huang; visualization analysis: Yingui Qiu, Jian Zhou; analysis and interpretation of results: Yingui Qiu; writing—review & editing: Yingui Qiu, Danial Jahed Armaghani, Biswajeet Pradhan, Annan Zhou, Jian Zhou; draft manuscript preparation: Yingui Qiu, Shuai Huang, Danial Jahed Armaghani, Biswajeet Pradhan, Annan Zhou, Jian Zhou. All authors reviewed the results and approved the final version of the manuscript.

All relevant data generated throughout this study are included in this article.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.