The uniaxial compressive strength (UCS) of rock is an essential property of rock material in different relevant applications, such as rock slope, tunnel construction, and foundation. It takes enormous time and effort to obtain the UCS values directly in the laboratory. Accordingly, an indirect determination of UCS through conducting several rock index tests that are easy and fast to carry out is of interest and importance. This study presents powerful boosting trees evaluation framework, i.e., adaptive boosting machine, extreme gradient boosting machine (XGBoost), and category gradient boosting machine, for estimating the UCS of sandstone. Schmidt hammer rebound number, P-wave velocity, and point load index were chosen as considered factors to forecast UCS values of sandstone samples. Taylor diagrams and five regression metrics, including coefficient of determination (R^{2}), root mean square error, mean absolute error, variance account for, and A-20 index, were used to evaluate and compare the performance of these boosting trees. The results showed that the proposed boosting trees are able to provide a high level of prediction capacity for the prepared database. In particular, it was worth noting that XGBoost is the best model to predict sandstone strength and it achieved 0.999 training R^{2} and 0.958 testing R^{2}. The proposed model had more outstanding capability than neural network with optimization techniques during training and testing phases. The performed variable importance analysis reveals that the point load index has a significant influence on predicting UCS of sandstone.
Uniaxial compressive strengthrock index testsmachine learning techniquesboosting treeIntroduction
The uniaxial compressive strength (UCS) of rock is the maximum compressive stress that rock can bear before failure under uniaxial compressive load [1]. It is one of the most basic mechanical parameters of rock mass in engineering investigation [2,3]. UCS has been widely recognized in rock foundation design [4], tunnel surrounding rock classification [5], rock mass quality evaluation [6], etc. The direct way to obtain the UCS of rock needs to be in accord with the suggestions by the international society for rock mechanics (ISRM) [1], and it is needed to make rock blocks into standard specimens and carry out rock tests in the laboratory. However, this measurement process is restricted by many conditions. For example, rock samples are required to be complete and should not contain joints and fissures. Furthermore, rock sampling and specimen processing and transportation have strict restrictions, and it is challenging to obtain the ideal rock core in highly fractured, weak, and weathered rock masses. Not only that, conducting the rock tests to obtain UCS is time-consuming and expensive [3,7,8]. Accordingly, it is requisite to find an economical and easy method to estimate the UCS of rock accurately [9].
Aladejare et al. [10] summarized the empirical prediction methodologies of UCS in rock. Some empirical equations for predicting UCS are listed in Table 1. The empirical estimation methods adopt the simple regression analysis to fit the correlation between the single or multiple physical or other mechanical parameters and UCS in rock. The physical parameters include Equotip hardness number [11], Schmidt Hammer rebound number (N) [12], Shore hardness [13], density (ρ) [14], porosity (n) [15], P-wave velocity (VP) [16], S-wave velocity (Vs) [17], unit weight (γ) [18], and slake durability index (SDI) [19]. The mechanical parameters used to predict the UCS are easier to obtain than the UCS, and they are comprised of block punch index (BPI) [12], Young’s modulus (E) [20], poisson ratio (v), Brazilian tensile strength (BTS) [14], point load strength (Is(50)) [14,15], and other properties. The empirical prediction equations are simple and effortless to use on-site. Nevertheless, they are only effective for certain rock and geological conditions [10].
Simple empirical equations for estimation of UCS
No.
Equation
R^{2}
Rock type
Reference
1
UCS=0.032VP−44.227
0.83
Multiple rocks
Mohamad et al. [21]
2
UCS=6.6VP1.6
0.92
Sedimentary
Uyanık et al. [17]
3
UCS=0.91VP−4500.6
0.87
Sedimentary
Aliyu et al. [14]
4
UCS=5.3466N−99.878
0.76
Sedimentary
Heidari et al. [12]
5
UCS=−47454.4+35905.6ρ−671.68ρ2
0.90
Sedimentary
Aliyu et al. [14]
6
UCS=149.33n−0.53
0.89
Metamorphic
Fereidooni et al. [15]
7
UCS=8.9217BPI−1.2334
0.77
Sedimentary
Heidari et al. [12]
8
UCS=23.49BPI0.68
0.82
Igneous
Kallu et al. [22]
9
UCS=12.8×(E/10)1.32
0.59
Sedimentary
Najibi et al. [20]
10
UCS=15.361BTS−10.303
0.82
Multiple rocks
Mohamad et al. [21]
11
UCS=6.75BTS1.08
0.80
Igneous
Kallu et al. [22]
12
UCS=10.03BTS+55.19
0.92
Metamorphic
Fereidooni et al. [15]
13
UCS=10.4BTS+18.2
0.63
Sedimentary
Aliyu et al. [14]
14
UCS=12.291Is(50)+5.892
0.96
Multiple rocks
Mohamad et al. [21]
15
UCS=4.792Is(50)+44.37
0.75
Metamorphic
Tandon et al. [23]
16
UCS=5.602Is(50)+4.380
0.96
Igneous
Tandon et al. [23]
17
UCS=17.6Is(50)+13.5
0.88
Sedimentary
Aliyu et al. [14]
Apart from empirical equations, multiple regression analyses and their results have been widely suggested in the literature, as shown in Table 2. Jalali et al. [24] applied N, BPI, Is(50), and VP to establish the multiple linear regression (MLR) for predicting the UCS of sedimentary. Armaghan et al. [25] fitted an empirical equation considering ρ, SDI, and BTS. Uyanık et al. [17] built an equation to estimate the UCS of sedimentary based on VP and VS. Teymen et al. [26] developed nine empirical equations adopting nine groups input parameters to foretell the UCS of multiple rocks. The multiple regression analyses consider the effect of multiple variables and are better than empirical equations only adopting one variable. Nevertheless, multiple regression analyses cannot get perfect results for complex problems [26].
Some multiple regression equations for estimating UCS of rock
With the development of artificial intelligence, intelligent techniques have been widely used to solve problems in science and engineering [32–41]. In civil engineering [42–44], they have been used in different fields such as the estimation of the sidewall displacement of the underground caverns [45], the prediction of water inflow into drill and blast tunnels [46], evaluation of disc cutters life of tunnel boring machine [47], and so on. Additionally, artificial intelligence and machine learning (ML) were highlighted by researchers as effective and relatively accurate in predicting rock mass and material properties [48–52]. Fuzzy inference systems (FIS) is a fuzzy information processing system based on fuzzy set theory and fuzzy inference. The fuzzy logic can reduce the uncertainty caused by unknown and variation and promote the application of FIS in rock mechanics [53]. The FIS widely used to predict the UCS can be divided into the Sugeno FIS [12,54], Mamdani FIS [54–56], and adaptive neuro-fuzzy inference system (ANFIS) [57–59]. FIS is simple in structure and is very effective in uncertain environments. However, the prediction results of FIS are likely to be based on uncertain assumptions, which leads to the inaccuracy of the prediction results under some conditions.
Genetic programming (GP) and gene expression programming (GEP) are parts of evolutionary computation, and they are based on the genetic algorithm (GA). GEP and GP adopt a generalized hierarchical computer program to describe a problem. Individual formation requires terminal and function symbols, which are different from GA. Wang et al. [60] adopted the GEP to build the relationship between N and UCS, and the obtained equation is validated in practical engineering. İnce et al. [61] employed GEP to build the model based on Is(50), n, and ρ for estimating the UCS, and the results showed that the GEP was preferable to predict the UCS of rock. Özdemir et al. [62] utilized GP to foretell the UCS of rock with the input parameters of VP, n, and N, and GP can generate a satisfactory equation for predicting the UCS. GEP and GP can give a explicit relationship between input variables and UCS, but the optimal model cannot be obtained if their parameters, such as mutation rate and population number, are improper.
ML is the leading method to implement artificial intelligence, and it can be divided into supervised learning and unsupervised learning. Based on statistics, ML builds the nonlinear mappings of input and output variables by analyzing complex internal relationships behind data. The supervised learning models are frequently used to predict the UCS of rock, and they include artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor (KNN), Gaussian regression, regression tree, and ensemble models. ML has a strong ability to extract information from data, and it has increasingly applied in the prediction of UCS of rock recently. For instance, Rahman et al. [63] adopted the neutral network to fit the relationship between VP and UCS in different rock types. Cao et al. [64] applied the extreme gradient boosting machine (XGBoost) to predict the UCS of granite based on the physical parameters and minerals percentage, and XGBoost has better estimation results than SVM and ANN. Gowida et al. [65] implemented the SVM to foretell the UCS of rock in time based on the six drilling mechanical parameters. Mahmoodzadeh et al. [66] utilized the Gaussian process to evaluate UCS of rock based on n, N, VP, and Is(50), and the Gaussian process performed better than other models. ML techiniques have the powerful ability to extract the relationship behind datasets, but their capacities rely on the quality of datasets and hyperparameters.
As the crucial part of ML, the boosting tree models have been increasingly used in geotechnical engineering, such as rockburst prediction [67–71], tunnel boring machine advance prediction [72], blast-induced ground vibration [73], and so on. Boosting trees have more outstanding performance than other models, such as ANN, SVM, etc. [69,74]. However, there are no studies about applying and comparing the application of boosting trees in predicting UCS of rock. To fill this gap, in this paper, three boosting trees models, adaptive boosting machine (AdaBoost), XGBoost, and category gradient boosting machine (CatBoost), are introduced to build the intelligent models for predicting the UCS of sandstone. The three models are developed and evaluated to compare their performance and choose an optimal model for estimating UCS of sandstone.
Tree-Based ModelsAdaBoost
Boosting is a strategy to build ensemble models, and it trains multiple weak learners according to the training set and combines these weak learners into a strong model. AdaBoost was proposed by Freund et al. [75], which is suitable for regression and classification and can improve the capability of the tree. In this study, there is a detailed introduction about AdaBoost for regression.
As shown in Fig. 1, before performing the regression task, there is needed to determine the number of trees (i.e., the number of iterations). Firstly, the weight of each sample in the training set is initialized. If the number of total samples is m, the initial weight of each sample is 1/m. Then, the weak regression trees are built. The maximum and relative errors in the samples are calculated, the relative error is used to determine the learning rate, and the learning rate is adopted to calculate the weight coefficient of weak learners. The distribution of training samples is updated according to the weight coefficient. Finally, these weak regression trees are combined. The weight coefficients of the weak regressors are sorted, and the last strong regression model is chosen according to the median value.
The flowchart to build AdaBoost modelsXGBoost
Gradient boosting [76] is the enhancement of AdaBoost, which is applicable to any differentiable loss functions. The negative gradient of the loss function in the current model is used to train a new weak learner, and then the trained weak learner is added to the existing model.
XGBoost is the development of gradient boosting [77], and it employs the Taylor second-order expansion of the loss function and adds the regularization term to control the complexity of the model. Fig. 2 shows the steps to build XGBoost. The loss function in XGBoost can be expressed as Eq. (1).
Obj(i)=∑i=1nl(yi,y^(i−1)+fi(xi))+Ω(fi)+Cwhere Obj(i) represents the loss function in the t iteration, yi depicts the actual value of the i sample, y^(i−1) is the predicted value of the model at the t−1 iteration, l(⋅) is the loss function, Ω(fi) is the regularization term, and C is a constant value.
The flowchart to develop XGBoost modelsCatBoost
CatBoost was proposed by Yandex in 2017 [78], and it is based on gradient boosting and can deal with the category data. CatBoost converts category data to numeric data to prevent overfitting [79]. CatBoost can effectively process the category data after performing random permutations. By training different base learners with multiple permutations, CatBoost can obtain the unbiased estimation of gradients to reduce the impact of gradient bias and improve the robustness.
Fig. 3 displays the flowchart to construct CatBoost. The oblivious trees are chosen as the base learners in CatBoost, and in the trees, the judgment conditions for each node in each layer are the same. The oblivious trees are relatively simple and can improve the prediction speed when fitting the model. CatBoost has fewer hyperparameters and better robustness, and it is easy to use.
The flowchart to construct CatBoost modelsDatabaseData Source
The data used in this study is the same data applied by Armaghani et al. [80]. The data was collected from Dengkil, Selangor, Malaysia. The sandstone composed of 85% mineral quartz and 15% clay is the primary rock in this area. To develop boosting trees, 108 sandstone blocks were sampled in the field, and these blocks were cored and processed into the standard samples according to the suggestions by the ISRM [1]. The prepared samples were subjected to rock mechanics testing in the laboratory. 108 samples with N, VP, Is(50), and UCS were obtained to build the database. N, VP and Is(50) are the input parameters for predicting the UCS.
Data Description
The database is statistically analyzed, and Table 3 lists the statistical information of the collected database, and the range of variables, mean value, standard deviation, and quantile are listed. UCS is between 23.2 and 66.8 MPa, and the rock belongs to low to medium strength according to ISRM, as shown in Fig. 4. The skew in input and output variables is not zero, indicating that the data distribution is asymmetrical. The kurtosis is less than zero, demonstrating that the database is dispersive. The scatter distributions between any two variables are displayed in Fig. 5. Fig. 6 shows the box plots of four parameters. The mean values of the four variables are greater than the median, and the box plots are right-skewed distributions. Eq. (2) is applied to calculate the correlation coefficient among all parameters. Fig. 7 exhibits the heatmap of the calculation results. In the heatmap, darker colors indicate higher correlations. It can be seen that four parameters are positively correlated. UCS has a strong correlation with VP and Is(50).
r=N∑xiyi−∑xi∑yiN∑xi2−(∑xi)2N∑yi2−(∑yi)2
The statistical information of the collected database
Statistical indicators
N
VP/(m/s)
Is(50)/(MPa)
UCS/(MPa)
Mean value
31.03
2413.19
2.51
47.68
Median
30.05
2401.50
2.48
47.20
Min value
19.40
1570.60
1.23
23.20
Max value
43.50
3063.41
4.15
66.80
Standard deviation
6.85
395.26
0.74
11.87
25th percentiles
25.60
2102.50
1.99
37.88
50th percentiles
30.05
2401.50
2.48
47.20
75th percentiles
37.60
2754.50
3.16
57.95
Skew
0.16
−0.24
0.26
−0.19
Kurtosis
−1.18
−1.08
−0.78
−1.03
The rock classification based on UCS suggested by ISRM [<xref ref-type="bibr" rid="ref-81">81</xref>]The scatter and histogram distributions of the databaseThe box plots of four variablesThe heatmap of the correlation coefficients between variablesStep-by-Step Study Flowchart
The database was established to construct the tree-base models for foretelling the UCS of sandstone. According to Fig. 8, the database is randomly split into two portions, one portion accounted for 80% of the database is adopted to train the tree-based models, and another portion accounted for 20% is utilized to evaluate the capabilities of models. The regression trees are developed, and three different boosting strategies are implemented to combine these trees for obtaining the final ensemble models. A ranking system composed of five regression metrics is introduced to evaluate the performance of three models during the training and testing stages. AdaBoost, XGBoost, and CatBoost are ranked and compared according to the ranking system. Finally, the relative importance of input parameters in the three models is calculated based on the principles of trees growth.
The technique flowchart to build tree-based models for predicting UCS in sandstoneModeling
For developing the tree-based models, the database is divided into the training parts (80%) and the testing parts (20%). The training parts include 86 datasets and are used to train AdaBoost, XGBoost, and CatBoost. Eq. (3) is adopted to process the input data. Three Python libraries, Scikit-learn [82], XGBoost [78], and CatBoost [77], are applied to develop AdaBoost, XGBoost, and CatBoost models, respectively.Xnorm=X−XminXminmaxwhere X is the original input parameter, Xmax represents the maximum value of input parameter, Xmin stands for the minimum value of input parameter, and Xnorm depicts the normalized parameter.
The regression trees are the base learners in the three models, and the number of trees controls the potential and complexity of the model. The number of trees needs to be reasonably determined to prevent overfitting, and for simplicity, other hyperparameters utilize the default value in Python libraries. In AdaBoost, the distribution of 86 training datasets is initialized, and the first tree is developed. Then, the linear loss function is used to evaluate the error between the predicted and actual UCS. The learning rate is set to 1, indicating no shrinkage when updating the model. Afterward, the tree is added to the AdaBoost to minimize the error continuously. Fig. 9 shows the R^{2} variation with the increase of trees. When the number of trees reaches 95, AdaBoost has the highest R^{2} and lowest error. Accordingly, the number of trees in AdaBoost is set to 95. Table 4 lists the primary hyperparameters of AdaBoost in this study. After building all the trees, AdaBoost combines the outcomes of 95 trees as the final output.
The <italic>R</italic><sup><italic>2</italic></sup> variation with the increasing of trees during the training process in AdaBoostThe hyperparameters in AdaBoost
Hyperparameters
Value
The number of trees
95
Learning rate
1
Loss function
Linear
The training process of XGBoost is similar to AdaBoost by appending trees in sequence to reduce the error. The learning rate is 0.3, which specifies the shrunk step size when updating the model. The maximum depth in trees controls the complexity, and it is set to 6. Additionally, XGBoost increases regularization terms to prevent overfitting for improving the potential. Table 5 presents these parameters values. From 0 to 100, the tree is added to XGBoost in turn. Fig. 10 shows the R^{2} variation, and the curve is smooth. After the number of trees gets to 35, training R^{2} does not vary. Therefore, the number of trees is 35.
The hyperparameters in XGBoost
Hyperparameters
Value
Learning rate
0.3
The number of trees (number of iterations)
35
The maximum depth in trees
6
L1 regularization
0
L2 regularization
1
The <italic>R</italic><sup><italic>2</italic></sup> variation with the increasing of trees during the training process in XGBoost
Compared to XGBoost and AdaBoost, CatBoost can automatically determine the learning rate according to the training set and iteration number, and the automatically determined value is close to the optimal. Additionally, the oblivious tree is adopted as the base learners, and its depth is set to 6. CatBoost also adds random strength, which is used to avoid overfitting. The default iterations are 1000 in the Python CatBoost library. To find an appropriate iterations number, the iterations increases from 10 to 1000 in steps of 10. Fig. 11 depicts the R^{2} variation during the training process in CatBoost. When the iterations reach 1000, the R^{2} is the maximum. Accordingly, the number of iterations is set to 1000, and the automatically determined learning rate is 0.25. Table 6 lists the primary parameters to develop the CatBoost model for predicting UCS in sandstone.
The <italic>R</italic><sup><italic>2</italic></sup> variation with the increasing of trees during the training process in CatBoostThe hyperparameters in CatBoost
Hyperparameters
Value
Learning rate
0.025
Iterations
1000
The tree depth
6
L2 regularization
3
Random strength
1
Results and DiscussionModel Performance Evaluation
AdaBoost, XGBoost, and CatBoost are built according to the 86 training samples and their corresponding parameters. The remaining 22 testing samples are utilized to evaluate the performance of the three models. R^{2}, root mean square error (RMSE), mean absolute error (MAE), variance account for (VAF), and A-20 index are calculated according to the predicted and measured UCS. These five indicators are widely recognized as the regression evaluation index [83–87]. Eqs. (4)–(7) show the equations for computing the RMSE, MAE, VAF, and A-20 index, respectively.
RMSE=1N∑i=1N(y^i−yi)2
MAE=1N∑i=1N|y^i−yi|
VAF=[1−var(yi−y^i)var(yi)×100]
A−20=m20Nwhere var(⋅) means the variance, and m20 is the number of samples with a ratio of the predicted value to the actual value in the range (0.8,1.2). For R^{2}, VAF, and A-20 index, the larger values are accompanied by better prediction performance. For RMSE and MAE, their values are closer to 0, and the model can get the superior capability. When the predicted values are totally equal to the actual, R2 and A-20 are 1, RMSE and MAR are 0, and VAF is 100%.
Figs. 12–14 exhibit the training and testing results in AdaBoost, XGBoost, and CatBoost, respectively. In these figures, the horizontal axis represents the actual UCS, and the vertical axis means the predicted UCS. When the predicted value is equal to the actual, the corresponding point falls in the red line. The points are closer to the red line, and the model has better estimation performance. The points representing XGBoost are closest to the red line, and XGBoost has the optimal capability. Additionally, the points between two purple dotted lines mean their predicted values are graters than 0.8 times the actual values and less than 1.2 times the actual values. Only the points predicted by Adaboost are outside the two purple dotted lines, and its performance is worst.
The training and testing results in AdaBoostThe training and testing results in XGBoostThe training and testing results in CatBoost
The Taylor diagrams [88] are introduced to analyze the training and testing results of three models, as shown in Fig. 15. Taylor diagrams combine the correlation coefficient, centered RMSE, and standard deviation into one polar diagram according to their cosine relationship (Eq. (8)). In Fig. 15, the distance from the origin means the standard deviation, and the angle from clockwise represents the correlation coefficient. It can be seen that the standard deviations of predicted UCS by three models are lower than that of actual UCS. Furthermore, the reference point with pentastar shape reflects the actual UCS, and other points nearer to the reference indicate that their predicted values have lower centered RMSE and their corresponding models have the superior capability. In the training and testing stages, XGBoost performs best, followed by CatBoost, and finally AdaBoost.
The Taylor diagrams of training and testing results
E′2=σp2+σa2−2σpσaRwhere E′ means the centered RMSE, σp is the variance of predicted values, σa is the variance of actual values, and the R is the correlation coefficient.
A ranking system comprised of R^{2}, RMSE, MAE, VAF, and A-20 index is implemented to rank the three models comprehensively, considering the performance in the training and testing processes. Table 7 presents the ranking system. There are three models, the score is from 3 to 1, and the model with better performance can get a higher score. For training or testing datasets, the total score is the sum of scores in five metrics. The final score of a model is the sum of scores in training and testing sets. The model with a higher final score has a preferable potential in both training and testing samples. The comprehensive performance ranking is: XGBoost > CatBoost > AdaBoost.
The ranking system in three models
Model
Dataset
R^{2}
MAE
RMSE
VAF (%)
A-20 index
Total score
Final score
Value
Score
Value
Score
Value
Score
Value
Score
Value
Score
AdaBoost
Training
0.783
1
4.647
1
5.471
1
78.354
1
0.919
2
6
12
Testing
0.794
1
4.33
1
5.459
1
79.473
1
0.909
2
6
XGBoost
Training
0.999
3
0.003
3
0.004
3
100
3
1
3
15
30
Testing
0.958
3
1.869
3
2.457
3
95.831
3
1
3
15
CatBoost
Training
0.988
2
0.948
2
1.307
2
98.765
2
1
3
11
22
Testing
0.886
2
3.085
2
4.06
2
88.62
2
1
3
11
Model Comparison
In the previous section, XGBoost was selected as the most accurate model in this research to predict sandstone strength. In this section, XGBoost is compared with the best model proposed by Armaghani et al. [80], as shown in Table 8. In terms of R^{2}, RMSE, and VAF in training and testing sets, XGBoost can perform better than the imperialist competitive algorithm (ICA)-ANN. Not only that, ICA-ANN utilized the ICA to tune the weights and biases of ANN and had better ability than ANN, but the optimization process done by Armaghani et al. [80], was complicated and time-consuming. By contrast, XGBoost has fewer parameters to tune and is easy to use, and it has more strength to predict the UCS of sandstone samples. It is important to note that the ultimate aim of a predictive model for rock strength is to develop a model which should have several features, i.e., be accurate enough, easy to apply as well as applicable in practice. Additionally, the performance of XGBoost for predicting UCS of rock is compared with other models proposed by other scholars recently, as shown in Table 9. XGBoost has more powerful ability to predict UCS than other models.
Results of the models by Armaghani et al. [<xref ref-type="bibr" rid="ref-80">80</xref>] to predict rock strength
Model
Training
Testing
R^{2}
RMSE
VAF (%)
R^{2}
RMSE
VAF (%)
ICA-ANNANN
0.9490.850
2.6024.492
94.76985.001
0.9400.769
2.9976.093
93.91576.386
Some models to predict UCS developed by other scholars
No.
Models
Input variables
R^{2}
1
FIS [12]
Is(50), N, BPI, VP
0.91
2
GEP [26]
Is(50), BTS
0.9047 (training), 0.9408 (testing)
3
ANN [26]
VP, BTS
0.9223 (training), 0.9220 (testing)
4
ANFIS [26]
Shore hardness, BTS
0.9149 (training), 0.9473 (testing)
5
DNN [66]
n, N, VP, Is(50)
0.9017
6
DT [66]
n, N, VP, Is(50)
0.9491
7
SVR [66]
n, N, VP, Is(50)
0.9363
8
M5P algorithm [89]
γ, N, n, VP, SDI,
0.89
9
FIS [56]
n, BPI, BTS, VP
0.923 (training), 0.853 (testing)
10
ANN [90]
N, VP, Is(50)
0.867 (training), 0.886 (testing)
11
ANFIS [90]
N, VP, Is(50)
0.956 (training), 0.946 (testing)
12
XGBoost
Is(50), N, VP
0.999 (training), 0.958 (testing)
Note: DNN = deep neural networks; DT = decesion trees.
Model Validation
To validate the application of the proposed boosting trees, 14 sandstone blocks were processed into standard specimens, and N, VP, Is(50), and UCS were measured. N is range 13.3 to 34.7, VP is range 2030 to 2960 m/s, Is(50) is range 1 to 3.7 MPa, and UCS ranges 23 to 52 MPa. N, VP, and Is(50) were input to the developed XGBoost model. The predicted UCS ranges 30.2 to 62.8 MPa. Fig. 16 compares the predicted and measured UCS. When the developed XGBoost is applied to the new datasets from other sandstone blocks, it achieves R^{2} of 0.801 and RMSE of 9.2833. The ratio of the measured UCS to the predicted UCS is between 0.67 and 1.02, and the predicted UCS of the model is larger than the real UCS. The obtained results show that the proposed model has great engineering applications. The proposed model in this study is able to predict UCS of rock samples with an acceptable level of accuracy if a new set of input parameters (within the range of inputs used in this research) will be available.
The predicted results of 14 validation datasetsThe Relative Importance of Input Parameters
The relative importance of input features can be calculated during the growth of the tree [91]. The significant parameters have a crucial impact on the performance of the model. Obtaining the relative importance of input parameters is beneficial to understanding the development principle behind the model. Fig. 17a shows the relative importance of N, VP and Is(50) in AdaBoost, XGBoost, and CatBoost. Although the importance ranking of input parameters is different in the three models, Is(50) is always the most vital variable. To determine the principal parameters affecting the UCS in sandstone, the importance score of each variable in three models is averaged. The Is(50) is the most essential, with a 0.47 importance score, followed by 0.30, and 0.24 scores for VP and N, respectively, as shown in Fig. 17b. Individual conditional expectation (ICE) plot is introduced to determine the influence of variables on the predicted UCS of XGBoost, as shown in Fig. 18. Each line shows the predicted UCS of a sample varying when a variable of interest changes and other variables are fixed. The purple line is the average of all lines, which shows the mean relationship between the variables and predicted UCS. When VP and N are fixed, predicted UCS of XGBoost rises with the increasing of Is(50). Similarly, the predicted UCS of XGBoost has a growing trend with the increase of VP and N.
The relative importance of input parameters: (a) The variable importance in three models; (b) The mean importance of variablesThe ICE plot to analyze the dependence of variables on UCSConclusion
In this research, 108 samples were used to investigate physical and mechanical properties in sandstone. Tree-based models are implemented to build intelligent models for predicting UCS of sandstone based on the established database. Considering the training and testing performance by Taylor diagrams and ranking system, XGBoost is the outstanding tree model to predict UCS in sandstone. The proposed XGBoost model has more strong learning ability to build the relationship between considered factors and UCS than other models developed by other researchers. Additionally, XGBoost has fewer parameters to tune than other models, such as ANN and GEP, and it is simple to use. The developed boosting trees solution is suitable for practical engineering, such as mine, quarry, tunnel, etc., which need to evaluate the UCS of rock with non-destructive methods accurately and timely. However, the considered variables are limited, and only three parameters are applied to foretell UCS. Besides, the combination of XGBoost and optimization techniques can improve the capacity to estimate UCS.
Funding Statement: The research was funded by Act 211 Government of the Russian Federation, Contract No. 02.A03.21.0011.
Conflicts of Interest: The author declare that they have no conflicts of interest to report regarding the present study.
ReferencesUlusay, R. (2014). He, M., Zhang, Z., Zhu, J., Li, N. (2022). Correlation between the constant mi of Hoek-Brown criterion and porosity of intact rock. Xiao, P., Li, D., Zhao, G., Liu, M. (2021). Experimental and numerical analysis of mode I fracture process of rock by semi-circular bend specimen. Rezazadeh, S., Eslami, A. (2017). Empirical methods for determining shaft bearing capacity of semi-deep foundations socketed in rocks. Xue, Y., Kong, F., Li, S., Zhang, L., Zhou, B.et al. (2020). Using indirect testing methods to quickly acquire the rock strength and rock mass classification in tunnel engineering. Wang, H., Lin, H., Cao, P. (2017). Correlation of UCS rating with Schmidt hammer surface hardness for rock mass classification. He, M., Zhang, Z., Zhu, J., Li, N., Li, G.et al. (2021). Correlation between the rockburst proneness and friction characteristics of rock materials and a new method for rockburst proneness prediction: Field demonstration. Yang, B., He, M., Zhang, Z., Zhu, J., Chen, Y. (2022). A new criterion of strain rockburst in consideration of the plastic zone of tunnel surrounding rock. Aladejare, A. E. (2020). Evaluation of empirical estimation of uniaxial compressive strength of rock using measurements from index and physical tests. Aladejare, A. E., Alofe, E. D., Onifade, M., Lawal, A. I., Ozoji, T. M.et al. (2021). Empirical estimation of uniaxial compressive strength of rock: Database of simple, multiple, and artificial intelligence-based regressions. Corkum, A., Asiri, Y., El Naggar, H., Kinakin, D. (2018). The Leeb hardness test for rock: An updated methodology and UCS correlation. Heidari, M., Mohseni, H., Jalali, S. H. (2018). Prediction of uniaxial compressive strength of some sedimentary rocks by fuzzy and regression models. Dinçer, İ., Acar, A., Ural, S. (2008). Estimation of strength and deformation properties of Quaternary caliche deposits. Aliyu, M. M., Shang, J., Murphy, W., Lawrence, J. A., Collier, R.et al. (2019). Assessing the uniaxial compressive strength of extremely hard cryptocrystalline flint. Fereidooni, D. (2016). Determination of the geotechnical characteristics of hornfelsic rocks with a particular emphasis on the correlation between physical and mechanical properties. Rahman, T., Sarkar, K. (2021). Lithological control on the estimation of uniaxial compressive strength by the P-wave velocity using supervised and unsupervised learning. Uyanık, O., Sabbağ, N., Uyanık, N. A., Öncü, Z. (2019). Prediction of mechanical and physical properties of some sedimentary rocks from ultrasonic velocities. Török, Á., Vásárhelyi, B. (2010). The influence of fabric and water content on selected rock mechanical parameters of travertine, examples from Hungary. Sharma, L., Vishal, V., Singh, T. (2017). Developing novel models using neural networks and fuzzy systems for the prediction of strength of rocks from key geomechanical properties. Najibi, A. R., Ghafoori, M., Lashkaripour, G. R., Asef, M. R. (2015). Empirical relations between strength and static and dynamic elastic properties of Asmari and Sarvak limestones, two main oil reservoirs in Iran. Mohamad, E. T., Armaghani, D. J., Momeni, E., Abad, S. V. A. N. K. (2015). Prediction of the unconfined compressive strength of soft rocks: A PSO-based ANN approach. Kallu, R., Roghanchi, P. (2015). Correlations between direct and indirect strength test methods. Tandon, R. S., Gupta, V. (2015). Estimation of strength characteristics of different Himalayan rocks from Schmidt hammer rebound, point load index, and compressional wave velocity. Jalali, S. H., Heidari, M., Mohseni, H. (2017). Comparison of models for estimating uniaxial compressive strength of some sedimentary rocks from Qom Formation. Armaghani, D. J., Safari, V., Fahimifar, A., Monjezi, M., Mohammadi, M. A. (2018). Uniaxial compressive strength prediction through a new technique based on gene expression programming. Teymen, A., Mengüç, E. C. (2020). Comparative evaluation of different statistical tools for the prediction of uniaxial compressive strength of rocks. Aboutaleb, S., Behnia, M., Bagherpour, R., Bluekian, B. (2018). Using non-destructive tests for estimating uniaxial compressive strength and static Young’s modulus of carbonate rocks via some modeling techniques. Madhubabu, N., Singh, P., Kainthola, A., Mahanta, B., Tripathy, A.et al. (2016). Prediction of compressive strength and elastic modulus of carbonate rocks. Ng, I. T., Yuen, K. V., Lau, C. H. (2015). Predictive model for uniaxial compressive strength for Grade III granitic rocks from Macao. Çobanoğlu, İ., Çelik, S. B. (2008). Estimation of uniaxial compressive strength from point load strength, Schmidt hardness and P-wave velocity. Azimian, A., Ajalloeian, R., Fatehi, L. (2014). An empirical correlation of uniaxial compressive strength with P-wave velocity and point load strength index on marly rocks using statistical method. Huang, J., Zhang, J., Gao, Y. (2022). Evaluating the clogging behavior of pervious concrete (PC) using the machine learning techniques. Asteris, P. G., Douvika, M. G., Karamani, C. A., Skentou, A. D., Chlichlia, K.et al. (2020). A novel heuristic algorithm for the modeling and risk assessment of the COVID-19 pandemic phenomenon. Luo, W., Yuan, D., Jin, D. L., Lu, P., Chen, J. (2021). Optimal control of slurry pressure during shield tunnelling based on random forest and particle swarm optimization. Asteris, P. G., Rizal, F. I. M., Koopialipoor, M., Roussis, P. C., Ferentinou, M.et al. (2022). Slope stability classification under seismic conditions using several tree-based intelligent techniques. Mahmood, W., Mohammed, A. S., Asteris, P. G., Kurda, R., Armaghani, D. J. (2022). Modeling flexural and compressive strengths behaviour of cement-grouted sands modified with water reducer polymer. Liao, J., Asteris, P. G., Cavaleri, L., Mohammed, A. S., Lemonis, M. E.et al. (2021). Novel fuzzy-based optimization approaches for the prediction of ultimate axial load of circular concrete-filled steel tubes. Gavriilaki, E., Asteris, P. G., Touloumenidou, T., Koravou, E. E., Koutra, M.et al. (2021). Genetic justification of severe COVID-19 using a rigorous algorithm. Zeng, J., Asteris, P. G., Mamou, A. P., Mohammed, A. S., Golias, E. A.et al. (2021). The effectiveness of ensemble-neural network techniques to predict peak uplift resistance of buried pipes in reinforced sand. Yang, H. Q., Zeng, Y. Y., Lan, Y. F., Zhou, X. P. (2014). Analysis of the excavation damaged zone around a tunnel accounting for geostress and unloading. Yang, H., Wang, Z., Song, K. (2020). A new hybrid grey wolf optimizer-feature weighted-multiple kernel-support vector regression technique to predict TBM performance. Mahmoodzadeh, A., Mohammadi, M., Nariman Abdulhamid, S., Hashim Ibrahim, H., Farid Hama Ali, H.et al. (2021). Dynamic reduction of time and cost uncertainties in tunneling projects. Mahmoodzadeh, A., Mohammadi, M., Farid Hama Ali, H., Hashim Ibrahim, H., Nariman Abdulhamid, S.et al. (2021). Prediction of safety factors for slope stability: comparison of machine learning techniques. Mahmoodzadeh, A., Mohammadi, M., Ghafoor Salim, S., Farid Hama Ali, H., Hashim Ibrahim, H.et al. (2022). Machine learning techniques to predict rock strength parameters. Mahmoodzadeh, A., Mohammadi, M., Hashim Ibrahim, H., Gharrib Noori, K. M., Nariman Abdulhamid, S.et al. (2021). Forecasting sidewall displacement of underground caverns using machine learning techniques. Mahmoodzadeh, A., Mohammadi, M., Gharrib Noori, M., Khishe, K., Hashim Ibrahim, M.et al. (2021). Presenting the best prediction model of water inflow into drill and blast tunnels among several machine learning techniques. Mahmoodzadeh, A., Mohammadi, M., Hashim Ibrahim, H., Nariman Abdulhamid, S., Farid Hama Ali, H.et al. (2021). Machine learning forecasting models of disc cutters life of tunnel boring machine. Li, D., Armaghani, D. J., Zhou, J., Lai, S. H., Hasanipanah, M. (2020). A GMDH predictive model to predict rock material strength using three non-destructive tests. Armaghani, D. J., Mamou, A., Maraveas, C., Roussis, P. C., Siorikis, V. G.et al. (2021). Predicting the unconfined compressive strength of granite using only two non-destructive test indexes. Fang, Q., Yazdani Bejarbaneh, B., Vatandoust, M., Jahed Armaghani, D., Ramesh Murlidhar, B.et al. (2021). Strength evaluation of granite block samples with different predictive models. Li, Y., Hishamuddin, F. N. S., Mohammed, A. S., Armaghani, D. J., Ulrikh, D. V.et al. (2021). The effects of rock index tests on prediction of tensile strength of granitic samples: A neuro-fuzzy intelligent system. Parsajoo, M., Armaghani, D. J., Mohammed, A. S., Khari, M., Jahandari, S. (2021). Tensile strength prediction of rock material using non-destructive tests: A comparative intelligent study. Gokceoglu, C. (2002). A fuzzy triangular chart to predict the uniaxial compressive strength of the Ankara agglomerates from their petrographic composition. Barzegar, R., Sattarpour, M., Nikudel, M. R., Moghaddam, A. A. (2016). Comparative evaluation of artificial intelligence models for prediction of uniaxial compressive strength of travertine rocks, case study: Azarshahr area, NW Iran. Mishra, D., Basu, A. (2013). Estimation of uniaxial compressive strength of rock materials by index tests using regression analysis and fuzzy inference system. Saedi, B., Mohammadi, S. D., Shahbazi, H. (2019). Application of fuzzy inference system to predict uniaxial compressive strength and elastic modulus of migmatites. Yesiloglu-Gultekin, N., Sezer, E. A., Gokceoglu, C., Bayhan, H. (2013). An application of adaptive neuro fuzzy inference system for estimating the uniaxial compressive strength of certain granitic rocks from their mineral contents. Armaghani, D. J., Mohamad, E. T., Momeni, E., Narayanasamy, M. S. (2015). An adaptive neuro-fuzzy inference system for predicting unconfined compressive strength and Young’s modulus: A study on Main Range granite. Jing, H., Nikafshan Rad, H., Hasanipanah, M., Jahed Armaghani, D., Qasem, S. N. (2021). Design and implementation of a new tuned hybrid intelligent model to predict the uniaxial compressive strength of the rock using SFS-ANFIS. Wang, M., Wan, W. (2019). A new empirical formula for evaluating uniaxial compressive strength using the Schmidt hammer test. İnce, İ., Bozdağ, A., Fener, M., Kahraman, S. (2019). Estimation of uniaxial compressive strength of pyroclastic rocks (Cappadocia, Turkey) by gene expression programming. Özdemir, E. (2021). A new predictive model for uniaxial compressive strength of rock using machine learning method: Artificial intelligence-based age-layered population structure genetic programming (ALPS-GP). Rahman, T., Sarkar, K. (2021). Lithological control on the estimation of uniaxial compressive strength by the P-wave velocity using supervised and unsupervised learning. Cao, J., Gao, J., Rad, H. N., Mohammed, A. S., Hasanipanah, M.et al. (2021). A novel systematic and evolved approach based on XGBoost-firefly algorithm to predict Young’s modulus and unconfined compressive strength of rock. Gowida, A., Elkatatny, S., Gamal, H. (2021). Unconfined compressive strength (UCS) prediction in real-time while drilling using artificial intelligence tools. Mahmoodzadeh, A., Mohammadi, M., Ibrahim, H. H., Abdulhamid, S. N., Salim, S. G.et al. (2021). Artificial intelligence forecasting models of uniaxial compressive strength. Zhou, J., Li, X., Mitri, H. S. (2016). Classification of rockburst in underground projects: Comparison of ten supervised learning methods. Wang, S. M., Zhou, J., Li, C. Q., Armaghani, D. J., Li, X. B.et al. (2021). Rockburst prediction in hard rock mines developing bagging and boosting tree-based ensemble techniques. Li, D., Liu, Z., Armaghani, D. J., Xiao, P., Zhou, J. (2022). Novel ensemble intelligence methodologies for rockburst assessment in complex and variable environments. Li, D., Liu, Z., Armaghani, D. J., Xiao, P., Zhou, J. (2022). Novel ensemble tree solution for rockburst prediction using deep forest. Li, D., Liu, Z., Xiao, P., Zhou, J., Jahed Armaghani, D. (2022). Intelligent rockburst prediction model with sample category balance using feedforward neural network and Bayesian optimization. Zhou, J., Qiu, Y., Armaghani, D. J., Zhang, W., Li, C.et al. (2021). Predicting TBM penetration rate in hard rock condition: A comparative study among six XGB-based metaheuristic techniques. Qiu, Y., Zhou, J., Khandelwal, M., Yang, H., Yang, P.et al. (2021). Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Zhou, J., Qiu, Y., Khandelwal, M., Zhu, S., Zhang, X. (2021). Developing a hybrid model of Jaya algorithm-based extreme gradient boosting machine to estimate blast-induced ground vibrations. Freund, Y., Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Chen, T., Guestrin, C. (2016). In XGboost: A scalable tree boosting system. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794. Association for Computing Machinery, San Francisco, California, USA. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., Gulin, A. (2017). CatBoost: Unbiased boosting with categorical features. arXiv preprint arXiv:1706.09516.Dorogush, A. V., Ershov, V., Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. arXiv preprint arXiv: 1810.11363.Armaghani, D. J., Amin, M. F. M., Yagiz, S., Faradonbeh, R. S., Abdullah, R. A. (2016). Prediction of the uniaxial compressive strength of sandstone using various modeling techniques. Ajalloeian, R., Jamshidi, A., Khorasani, R. (2020). Evaluating the effects of mineral grain size and mineralogical composition on the correlated equations between strength and schmidt hardness of granitic rocks. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.et al. (2011). Scikit-learn: Machine learning in Python. Mohamad, E. T., Armaghani, D. J., Mahdyar, A., Komoo, I., Kassim, K. A.et al. (2017). Utilizing regression models to find functions for determining ripping production based on laboratory tests. Huang, L., Asteris, P. G., Koopialipoor, M., Armaghani, D. J., Tahir, M. (2019). Invasive weed optimization technique-based ANN to the prediction of rock tensile strength. Yang, H., Koopialipoor, M., Armaghani, D. J., Gordan, B., Khorami, M.et al. (2019). Intelligent design of retaining wall structures under dynamic conditions. Armaghani, D. J., Asteris, P. G., Fatemi, S. A., Hasanipanah, M., Tarinejad, R.et al. (2020). On the use of neuro-swarm system to forecast the pile settlement. Jahed Armaghani, D., Hasanipanah, M., Bakhshandeh Amnieh, H., Tien Bui, D., Mehrabi, P.et al. (2020). Development of a novel hybrid intelligent model for solving engineering problems using GS-GMDH algorithm. Taylor, K. E. (2001). Summarizing multiple aspects of model performance in a single diagram. Ghasemi, E., Kalhori, H., Bagherpour, R., Yagiz, S. (2018). Model tree approach for predicting uniaxial compressive strength and Young’s modulus of carbonate rocks. Armaghani, D. J., Mohamad, E. T., Hajihassani, M., Yagiz, S., Motaghedi, H. (2016). Application of several non-linear prediction tools for estimating uniaxial compressive strength of granitic rocks and comparison of their performances. Zhou, Z., Hooker, G. (2021). Unbiased measurement of feature importance in tree-based methods.