In order to overcome the deficiencies of current methods for the prediction of the productivity of shale gas horizontal wells after fracturing, a new sophisticated approach is proposed in this study. This new model stems from the combination several techniques, namely, artificial neural network (ANN), particle swarm optimization (PSO), Imperialist Competitive Algorithms (ICA), and Ant Clony Optimization (ACO). These are properly implemented by using the geological and engineering parameters collected from 317 wells. The results show that the optimum PSO-ANN model has a high accuracy, obtaining a R^{2} of 0.847 on the testing. The partial dependence plots (PDP) indicate that liquid consumption intensity and the proportion of quartz sand are the two most sensitive factors affecting the model’s performance.

Predrilling prediction of productivity for shale gas horizontal wells is an important link in the formulation and optimization of development schemes for shale gas. It provides an assessment basis for the investment risk of shale reservoir development and is particularly important for guiding the development [

However, the current productivity prediction technology for shale gas horizontal wells is not yet mature. These technologies mainly include physical simulation methods, empirical formula methods, analytical methods and numerical simulation. Notably, analytical methods can only study the two-dimensional seepage process of single-phase fluids in homogeneous formations, it can not predict the production of gas wells with complex gas deposits [

ANN is a new artificial intelligence method developed on biological research. It has the ability to learn and solve complex nonlinear problems through self-learning. Currently, machine-learning methods have been widely applied to estimate oil and coalbed methane well production performance [

No matter what problems the machine learning methods have solved, an enlarged database is an important prerequisite to obtain a satisfactory prediction performance and improve the generalisation capability of the model [

The artificial neural network is composed of a large number of neurons interconnected. The function of each neuron is relatively simple that performs the following commands:

where Y is the neuron output; w_{i} are the weights; X_{i} are the neuron inputs and a is bias.

All neurons which are set in different layers are connected by weights, and the function between inputs and outputs is conducted as follows:

_{i}, V and vector a_{i} are model parameters; L is the number of layers.

The network is trained by performing optimization of weights until the output values are as close as possible to the actual outputs. The architecture of the ANN was turned based on the mean squared error (MSE), which is defined as:

_{i}^{*} and Y_{i} are the predicted and true values of the test production per unit well length.

PSO can find the optimum through information sharing between individuals in the swarm [

The particles update the position formula by the following formula:

where a real vector is used to represent the position of a single particle. _{1} and a_{2} are the inertia parameter. f_{1} and f_{2} are random numbers between 0 and 1.

where the k is the current iteration number of swarm; T_{max} is the maximum iteration number set; ω_{max} is the maximum inertia weight, and ω_{min} is the minimum inertia weight. ω_{max} is generally set to be 0.9, and ω_{min} is generally set to be 0.4.

The flowchart of the PSO algorithm is demonstrated in

ICA starts with initial populations called countries [

ACO is a bionic intelligent optimization algorithm. Ant colony algorithm is inspired by the process of ants foraging, ants leave pheromones on their way to find food sources, and ants in the colony can sense pheromones and move along places with high pheromone concentrations, forming a positive feedback mechanism. After a period of time, the ants can determine an optimal path to the food source. The basic idea of optimizing ANN with ACO is: First, the elements of the weight matrix and the bias vector are taken out to form the path coordinates of the ant population. Because the shorter the ant’s path to the food source, the higher the pheromone content on the path, so the mean square error (MSE) is used as the ant’s fitness value. The shortest path determined by the final ant population is used as the optimal initial weight and bias. Then the optimal weight and bias were assigned to the ANN for training and testing, and the error is compared with the prediction of the ANN before optimization.

PDP is a method used to determine the dependence of prediction on input variables. PDP represents the marginal impact of one or two features on the prediction results of the machine learning model, that is, how the variables affect the prediction results. The partial dependence function for regression is as follows:

where x_{s} represents the characteristic variable of interest and x_{c} represents other variables.

A function f(x_{s}) that only depends on x_{s} can be obtained by integrating x_{c}. This function is a partially dependent function because it can realize the interpretation of a single variable x_{s}. In actual operation, the Monte Carlo method is used to determine the partial dependence function by calculating the average value of the training set. The specific formula is as follows:

where n represents the sample size.

The specific implementation steps of the single variable PDP are as follows: (1) Select a characteristic variable of interest for research and define the searching grid. (2) Substitute each value in the searching grid into x_{s} in the above PDP function, a black box model is used to make predictions and obtain average predicted values. (3) The relationship curve between the variable and the predicted value is the partial dependence graph.

The shale gas horizontal well production is affected by geological factors and engineering factors. Geological parameters guide the comprehensive analysis of the target reservoir and effectively transform the target reservoir. Combining the previous production measures, production performance, and profile modification of adjacent wells can effectively improve the production of a single well through reasonable fracturing operation parameters.

Geological factors

When the vertical depth of shale reservoir increases, especially when it exceeds 3500 m, the horizontal

Engineering factors

Studies have reported that reasonable distribution of cluster spacing is conducive to the increase of stimulated reservoir volume [

Therefore, the above eight geological parameters and seven engineering parameters were selected as the key research variables.

A normalized input signal can make the average of sample to be close to zero, which can accelerate the learning speed of the ANN model. In this study, the geological and engineering parameters of 317 shale gas wells were normalized as input variables of the ANN model according to

Therefore, data samples including a shale gas horizontal well normalized test productions per unit well length, the eight geological parameters, and the seven engineering parameters were used in this study.

where x_{i}* represents normalized input variable; x_{i} represents unnormalized input variable; x_{min} represents the minimum value of the input variable; x_{max} represents the maximum value of the input variable; y_{i}* represents normalized output variables; y_{i} represents unnormalized output variables; y_{min} represents the minimum value of the output variable; and y_{max} represents the maximum value of the output variable.

It is worth noting that the correlation between the input variables of the ANN model should not be too strong, otherwise, it will affect the accuracy of prediction.

For the prediction of test productions per unit well length by the three artificial intelligence techniques, ANN model was developed first. Then, the PSO, ICA and ACO were used to optimize ANN model by optimizing the weights and biases. Simultaneously, based on trial tuning and experience, considering the complexity of input parameters, the tuning ranges for the number of neurons were 1-120. “trial and error” (TAE) was conducted with one and two hidden layers of ANN models. Ultimately, the ANN model 15-69-1 was defined as the best ANN technique for predicting test productions per unit well length in this study. The optimum ANN architecture used for further analysis is illustrated in

The reliability of ANN model was evaluated by the statistical descriptors including coefficients of determination (R^{2})/Root-mean-square error (RMSE)/Slope of the regression line (k)/Willmott’s index of agreement (IA) were calculated between the predicted and actual shale gas test production per unit well length. Based on the statistical recommendation, a good prediction can be evaluated with R^{2} > 0.64, 0.85 < k < 1.15, or IA > 0.80 [^{2}, RMSE, k and IA are defined as follows:

where N represents the number of dataset; y_{i} and y_{i}* represent actual values and predicted values, respectively;

The PSO algorithm parameters were set up before optimization of the ANN model as shown in

Parameters | Value | Parameters | Value |
---|---|---|---|

Inertia weight | 0.9 | Maximum particle’s velocity | 0.6 |

Number of iteration | 200 | Individual cognitive | 1.2 |

Number of particle swarms | 30 | Group cognitive | 1.2 |

As shown in ^{2} = 0.876. The predicting performance set of optimum ANN model on the testing set is as follows: IA = 0.965, k = 0.957, RMSE = 0.0009, R^{2} = 0.847.

In this section, the ICA was used to optimize the weights and biases of the selected initialization ANN model. The ICA algorithm parameters were set up before optimization of the ANN model as shown in

Parameters | Value | Parameters | Value |
---|---|---|---|

Maximum number of iterations | 200 | Number of initial countries | 30 |

Assimilation coefficient | 2 | Initial imperialists | 5 |

Lower-upper limit of the optimization region | [−3, 3] | – | – |

As shown in ^{2} = 0.867. The predicting performance set of optimum ANN model on the testing set is as follows: IA = 0.958, k = 0.946, RMSE = 0.0009, R^{2} = 0.842.

In this section, the ACO was used to optimize the weights and biases of the selected initialization ANN model. The ACO algorithm parameters were set up before optimization of the ANN model as shown in

Parameters | Value | Parameters | Value |
---|---|---|---|

Pheromone volatile factor | 0.5 | Pheromone factor | 1.5 |

Boundary of the parameters | [−3, 3] | Factor of heuristic function | 0.3 |

The maximum number of iterations | 200 | Ants population | 50 |

As shown in ^{2} = 0.862. The predicting performance set of optimum ANN model on the testing set is as follows: IA = 0.954, k = 0.951, RMSE = 0.0009, R^{2} = 0.838.

The results of the three developed models were compared through the ranking and intensity of color. From

Model | R^{2} |
RMSE | k | IA | Rank for R^{2} |
Rank for RMSE | Rank for k | Rank for IA | Total rank |
---|---|---|---|---|---|---|---|---|---|

PSO-ANN | 0.876 | 0.0002 | 0.986 | 0.968 | 3 | 3 | 2 | 3 | 11 |

ICA-ANN | 0.867 | 0.0023 | 0.986 | 0.963 | 2 | 2 | 2 | 2 | 8 |

ACO-ANN | 0.862 | 0.0024 | 0.966 | 0.961 | 1 | 1 | 1 | 1 | 4 |

Based on the reports of ^{2}, RMSE, k, IA), the PSO-ANN model ranks highest and provided the highest performance on both training and testing set (i.e., lowest error).

Model | R^{2} |
RMSE | k | IA | Rank for R^{2} |
Rank for RMSE | Rank for k | Rank for IA | Total rank |
---|---|---|---|---|---|---|---|---|---|

PSO-ANN | 0.847 | 0.0009 | 0.957 | 0.965 | 3 | 2 | 3 | 3 | 11 |

ICA-ANN | 0.842 | 0.0009 | 0.946 | 0.958 | 2 | 1 | 1 | 2 | 6 |

ACO-ANN | 0.838 | 0.0009 | 0.951 | 0.954 | 1 | 1 | 2 | 1 | 5 |

Results indicated that the order of relative importance of the 15 variables was as follows: liquid consuming intensity > quartz sand proportion > brittleness index> cluster number > Poisson’s ratio > flow back rate > displacement > vertical depth > porosity > slick water proportion > total gas content > proppant injection intensity > Young’s modulus > pressure coefficient > TOC. According to the above order, the corresponding variable serial numbers were a, b, c, d, e, f, g, h, i, j, k, l, m, n, and o.

The main aim of this study was to verify the PSO-ANN method for prediction of shale gas horizontal well production. The PSO-ANN method has the advantages of less time consuming and low cost, which is more obvious in the research with large data samples. In addition, the PSO-ANN method has the following advantages: (1) Accuracy of the PSO-ANN method will not suffer from idealized assumptions and parameter settings, and it can automatically learn and solve the nonlinear relationship between input and output variables using only input variables. (2) The hybrid models can directly predict the test production per unit well length from the influencing variables, without field production testing, that is, there is no need for history matching data in the early stage of horizontal shale gas well drainage. (3) A more comprehensive data set can be used to easily build and update a general model, which indicates that the generalization capability of the hybrid model was good.

Though a large number of scientific studies have proved that the application of ML technology for natural gas production prediction is very promising, there are still challenges. First of all, the dataset came from shale gas wells in China, and the trained model may not be generalized to shale gas wells in other regions because the characterization and fracturing techniques in other reservoirs are different. Thus, a cross regional and highly accessible database is an important prerequisite. How to improve the prediction accuracy is another challenge because advanced algorithms are rare and urgently needed. Lastly, How to apply artificial intelligence technology to other aspects in the shale gas horizontal well fracturing design process is also worth studying.

This paper proposed three new artificial intelligence techniques for predicting the shale gas production based on the ANN combined with PSO, ICA, and ACO. Comparison were performed in this work and the relative variable importance was investigated using PDP. According to the results of this study, the following conclusions can be drawn:

(1)The optimum ANN-PSO model constructed for predicting the productivity of shale gas horizontal wells had 1 hidden layer with 69 neurons.

(2)The PSO provided the highest performance in optimizing the ANN model. The predicting performance that IA = 0.965, k = 0.957, RMSE = 0.0009 and R^{2} = 0.847. The ANN-PSO model optimized was successful in learning the nonlinear relationship between shale gas production and variables affecting the prediction.

(3)PDP indicated that liquid consuming intensity and the proportion of quartz sand are the two most sensitive factors affecting the accuracy of the optimum ANN-PSO model’s performance predicting the productivity of shale gas horizontal wells.

This study was financially supported by China United Coalbed Methane Corporation, Ltd. (ZZGSSALFGR2021-581), Bin Li received the grant.

Bin Li studied conception and design, collected data, analysis results, drafted manuscript preparation, reviewed the results and approved the final version of the manuscript.

The author declare that he has no conflicts of interest to report regarding the present study.

_{2}-foam flooding for improving oil recovery