Since the existing prediction methods have encountered difficulties in processing the multiple influencing factors in short-term power load forecasting, we propose a bidirectional long short-term memory (BiLSTM) neural network model based on the temporal pattern attention (TPA) mechanism. Firstly, based on the grey relational analysis, datasets similar to forecast day are obtained. Secondly, the bidirectional LSTM layer models the data of the historical load, temperature, humidity, and date-type and extracts complex relationships between data from the hidden row vectors obtained by the BiLSTM network, so that the influencing factors (with different characteristics) can select relevant information from different time steps to reduce the prediction error of the model. Simultaneously, the complex and nonlinear dependencies between time steps and sequences are extracted by the TPA mechanism, so the attention weight vector is constructed for the hidden layer output of BiLSTM and the relevant variables at different time steps are weighted to influence the input. Finally, the chaotic sparrow search algorithm (CSSA) is used to optimize the hyperparameter selection of the model. The short-term power load forecasting on different data sets shows that the average absolute errors of short-term power load forecasting based on our method are 0.876 and 4.238, respectively, which is lower than other forecasting methods, demonstrating the accuracy and stability of our model.

The power system presents a market-oriented trend, and the accurate forecasting of power loads is one of its key tasks [

The commonly used methods for short-term power load forecasting fall into two main categories: traditional time series forecasting models and machine learning forecasting models. The time series forecasting models, such as the exponential smoothing analysis [

Nevertheless, the emergence of machine learning prediction models, like neural networks [

The central idea of the swarm intelligence optimization algorithm is to search for the optimal solution in the solution space within a specific range by simulating bionics [

Herein, we propose a bidirectional long short-term memory (BiLSTM-TPA) neural network prediction method based on temporal attention for short-term power load forecasting. Considering the inner relationship between multivariate and time series, the grey correlation analysis method is used to determine a similar sample set with a large correlation degree with the day to be predicted to ease neural network prediction. Then, the positive and negative internal characteristics of power load data are learned through the bidirectional LSTM layer, and the TPA temporal pattern attention mechanism is combined to further learn the interdependence between multiple variables at different times and sequences. Finally, the CSSA chaotic sparrow search algorithm is used to optimize the hyper-parameters of the BiLSTM-TPA model to obtain the final prediction results. The proposed prediction model has a decrease in the mean absolute percentage error, root mean square error, and mean absolute error by analyzing the training results of load data in different regions of China and comparing it with other prediction algorithms.

The grey correlation is a measure of the magnitude of the association between two or more factors, and the correlation indicates the degree to which the factors affecting the development of something influence each other [_{i} on day i^{{th}} is defined as:
_{0}, which is defined as:

Each element in the total sequence and the subsequence is divided by the mean value in the respective vector for averaging, and let the averaged vectors be X_{i}’ and X_{0}’. The number of correlation coefficients of X_{i}’ and X_{0}’ is ζ_{ι(k)}.

Each factor corresponds to a correlation coefficient, so there are n correlation coefficients, the average of which is the correlation degree between the systems. The correlation coefficients are as shown in _{i} indicates a higher degree of correlation.

Bidirectional Long short-term memory is composed of forward LSTM and backward LSTM. LSTM is a special recurrent neural network, which controls the transmission state through the gated state, remembers the information that needs long time memory, and forgets the unimportant information. Its structure is shown in

The LSTM unit has forget, input, and output gates. In _{t−1} represents the state of the previous cell; h_{t−1} represents the output of the previous unit; and _{t}_{t} is the degree of information forgetting; i_{t} represents the degree of input information retention; _{t}

The bidirectional LSTM allows the relationships of load sequences to be extracted from the forward and backward directions and connected to the same output to ensure the full utilization of data information and avoid early information forgetting caused by overlong data time series. Its network structure is shown in

An attention mechanism is a resource allocation mechanism that mimics the attention of the human brain. In general, the human brain focuses its attention on the areas of interest at a particular moment, reducing or even eliminating the attention paid to other areas to obtain more detailed information that needs to be focused on, thereby suppressing other useless information, ignoring irrelevant information and amplifying the required information [

TPA extracts important features from the row vectors of the BiLSTM hidden state matrix through multiple one-dimensional convolutional neural network (CNN) filters. Therefore, the model learns the interdependence between multiple variables within the same time step and across all previous times and sequences. Its structure is shown in

The hidden state matrix _{j} denotes the j^{{th}} filter of length T; T represents the length of the data set processed by the attention mechanism and the value of w in this paper; H_{i,j} represents the result value of the action of the i^{{th}} row vector and the j^{{th}} convolution kernel, as shown in

^{c}; W_{a} is the weight matrix of m × k. The sigmoid function is used for normalization to obtain the attention weight, which is convenient for selecting multiple variables. Attention weight _{i} defined as:

Each row in H^{c} is weighted sum by attention weight _{i} to get output V_{t}.
_{t} is fused with the output h_{t} at the last moment, and the final predicted output _{h}_{v}

The SSA sparrow search algorithm is a new swarm intelligence optimization algorithm proposed by Xue et al. [

Tent chaotic sequence has small period and unstable period points, for which the variable _{T} is the number of particles in the chaotic sequence, and rand (0, 1) is a random number between [0, 1]. In the process of population initialization of the SSA algorithm, the Tent chaotic sequence is introduced to initialize the population, and N D-dimensional vectors are generated. Each component is carried to the value range of the original problem space variable through _{max} and d_{min} are the maximum and minimum values of d^{{th}} dimension variable ^{,} is the individual that needs chaotic disturbance, _{new}

The Gaussian variation is derived from the Gaussian distribution. The original parameter values are replaced by a random number of the normal distribution with mean μ and variance

The optimization speed of the standard SSA algorithm is affected by the non-uniformity of logistic traversal, and the optimization efficiency will be reduced, while the value of the improved Tent map is more uniform. After adding Gaussian mutation, it can be seen from the normal distribution characteristics that the key search area of Gaussian mutation is a local area attached to the original individual, which is conducive to the algorithm to find the global minimum point efficiently and accurately and improves the robustness of the algorithm.

In short-term load forecasting, the current load value is associated with the information of historical time and future time. In this paper, the BILSTM network considering bidirectional time information is selected as the underlying model of short-term load forecasting. TPA is introduced to compensate for the traditional attention mechanism that failed to extract the interdependence between multiple variables at different times and sequences. Also, to find the optimal hyperparameters, CSSA is introduced to optimize the hyperparameters of the model while constructing the model because hyperparameters are essential in the prediction model during the prediction process.

Herein, the minimized mean square error between the expected output and the actual output of the BiLSTM-TPA network is used as the fitness function, that is, to find a set of network hyperparameters to minimize the error of BiLSTM-TPA. The BiLSTM-TPA model structure is shown in

To optimize the super-parameters of BiLSTM, CSSA is introduced. Firstly, BiLSTM decodes the parameters introduced by CSSA to obtain the learning rate, the number of iterations, and the number of nodes in each hidden layer. After training the network model, the test set samples are predicted to obtain the error mean square deviation of the actual and expected output values. The mean square deviation is transmitted to the CSSA part as the fitness value, the optimal global solution is iteratively updated according to the fitness, and the optimized network model hyper-parameters are finally obtained. The chaotic sparrow search algorithm flow is shown in

Step 1: Initialize the population and the number of iterations, and initialize the proportion of predators and adders.

Step 2: Apply the Tent chaotic sequence in

Step 3: Calculate the fitness of each sparrow, and find the best position and fitness and the worst position and fitness.

Step 4: Select the top N sparrows with excellent fitness as discoverers, and the rest as the adder, and update the location of the discoverer and the adder according to SSA.

Step 5: Randomly select M sparrows for early warning and update the location.

Step 6: Update individual performance using Gaussian variation in

Step 7: Update the position and fitness of the whole population to sort according to the current situation of the sparrow population.

Step 8: Determine whether the algorithm runs to meet the conditions and exit the output results if it is satisfied; otherwise, return to Step 4.

To assess the accuracy of the model, the mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), and determination coefficient (R^{2}) are selected as the criteria for prediction accuracy in this paper, and their equations are calculated as follows:
^{2} is used to judge the quality of the model, and the value range is [0, 1]. The larger R^{2} is, the better the prediction results.

To demonstrate the validity of the model given in this paper, the results of LSTM, BiLSTM-Attention, BiLSTM-TPA, SSA-BiLSTM-Attention, CSSA-BiLSTM-Attention and IPSO-BiLSTM-Attention models are compared with the results of the method proposed in the paper.

The validity of the proposed method is verified by using the measured data from February 13, 2010 to May 20, 2010, Zhejiang Province, and from May 01, 2021 to August 30, 2021, Shaanxi Province. The time interval of load data collected in Zhejiang Province is one h, and the original data set A is composed of meteorological factors collected by local weather stations. The time interval of load data collected in Shaanxi Province is 15 min, and the original data set B is composed of data preprocessing and meteorological factors collected by local meteorological stations.

The load data collected from Zhejiang Province is used as a sample to unify the unit of the load data collected in a certain area of Shaanxi Province. The data size of dataset A and the dataset B are composed of the load data of N × 24, the average temperature, the maximum temperature, the minimum temperature, the relative humidity, and the week type of the original data of N × 29, where N represents the total sampling time of the data. Meteorological factors can be obtained directly from the data collected by meteorological stations, and the week type is the degree of different coefficients obtained according to the electricity consumption at different weeks. To improve the training effect of the model, the linear mapping is used to calculate between [0, 1], as follows:
^{*} is the normalized data, _{min} is the minimum sample data, and _{max} is the maximum sample data.

The dataset used is the standard dataset A of a certain place in Zhejiang Province. The average temperature, maximum temperature, minimum temperature, relative humidity, and week type of the day to be predicted on May 20, 2010 are used as reference values. The characteristics of the day to be predicted are taken as the characteristic sequence for grey correlation analysis. The data with a correlation degree greater than 0.7 are selected to form a dataset C similar to the day to be predicted. The correlation analysis is shown in

The training model is based on the data before May 18, 2010 in dataset C. The 29 data on May 19, 2010 and the average temperature, maximum temperature, minimum temperature, relative humidity, and week type on the predicted day were the input and output of the 24 loads on May 20, 2010. The CSSA parameter optimization is used to calculate the fitness calculation, and the mean square error of the validation set is used as a fitness function to find a set of parameters to minimize the network error.

The evaluation indexes of the prediction day are shown in ^{2} index increased by 2.44%, 1.85%, 1.23%, 0.39%, 0.25% and 0.88%.

Models | MAPE | RMSE | MAE | R^{2} |
---|---|---|---|---|

LSTM | 2.82% | 2.335 | 2.054 | 96.84% |

BiLSTM-AT | 2.24% | 2.104 | 1.738 | 97.43% |

BiLSTM-TPA | 2.17% | 1.835 | 1.472 | 98.05% |

SSA-BiLSTM-AT | 1.50% | 1.385 | 1.069 | 98.89% |

CSSA-BiLSTM-AT | 1.41% | 1.295 | 1.055 | 99.03% |

IPSO-BiLSTM-AT | 2.08% | 1.660 | 1.487 | 98.40% |

CSSA-BiLSTM-TPA | 1.23% | 1.117 | 0.876 | 99.28% |

To verify the improvement of the TPA mechanism in the short-term power load forecasting method, the BiLSTM-AT model based on the traditional attention mechanism and the BiLSTM-TPA model based on the time-mode attention mechanism were trained, respectively. The MAPE, RMSE, MAE, and R^{2} were used to evaluate the prediction accuracy under the same conditions. It was found that the MAPE, RMSE, and MAE of the BiLSTM-TPA model decreased by 0.07%, 0.269%, and 0.266%, respectively, and the R^{2} increased by 0.62%. As shown, the prediction accuracy of the TPA mechanism based on the BiLSTM model is higher than that of the traditional attention mechanism. Because different input variables have different characteristics, the traditional attention mechanism assigns the same attention weight to different characteristics of input variables and cannot consider the proportion of different variables in different time steps. On the other hand, the TPA mechanism performs feature extraction on the hidden row state matrix through the convolution layer, enabling the model to learn the interdependence between multiple variables within the same time step and across all previous times and sequences. Therefore, the TPA mechanism can select relevant information for each input variable from different time steps.

In the comparison of the CSSA-BiLSTM-AT model, SSA-BiLSTM-AT model, and BiLSTM-AT model, it is found that the MAPE, RMSE, and MAE of CSSA-BiLSTM-AT model and SSA-BiLSTM-AT model are significantly lower than those of BiLSTM-AT model, and the R^{2} is increased by 1.60% and 1.46%, respectively, indicating that the sparrow search algorithm and its improved algorithm have good results in the super-parameter optimization of BiLSTM-AT model. At the same time, comparing the CSSA-BiLSTM-AT model with the SSA-BiLSTM-AT model, the evaluation indicators MAPE, RMSE, and MAE are reduced by 0.09%, 0.090%, and 0.014%, respectively, and the R^{2} is increased by 0.14%, which verifies that the CSSA algorithm has higher optimization accuracy than the SSA algorithm. At last, Comparing the CSSA-BiLSTM-AT model with the IPSO-BiLSTM-AT model, the evaluation indicators MAPE, RMSE, and MAE are reduced by 0.85%, 0.543%, and 0.611%, respectively, and the R^{2} is increased by 0.88%. As verified, the proposed model has better performance than traditional prediction methods. The prediction results and real values of different models are shown in

To verify the applicability of our method, it is applied to the data of other provinces. The dataset used is the standard data set B of a certain place in Shaanxi Province. The average temperature, maximum temperature, minimum temperature, relative humidity, and week type on August 30, 2021 to be predicted are taken as the reference values, and the characteristics of its duration are taken as the characteristic sequence for grey correlation analysis. The data with a correlation degree greater than 0.7 are selected to form dataset D, similar to the day to be predicted.

CSSA optimizes the model hyper-parameters, and the fitness function is stabilized at 0.008 after iteration, as shown in

The prediction results between the proposed method and other models are shown in

To verify the rationality and stability of the proposed model, the evaluation indicators of each model are shown in ^{2} index was improved by 9.68%, 8.22%, 6.24%, 4.96%, 2.66% and 3.06%.

Models | MAPE | RMSE | MAE | R^{2} |
---|---|---|---|---|

LSTM | 14.04% | 7.803 | 6.666 | 82.10% |

BiLSTM-AT | 14.19% | 7.478 | 6.391 | 83.56% |

BiLSTM-TPA | 13.19% | 7.013 | 5.764 | 85.54% |

SSA-BiLSTM-AT | 13.12% | 6.696 | 5.863 | 86.82% |

CSSA-BiLSTM-AT | 11.31% | 6.083 | 4.930 | 89.12% |

IPSO-BiLSTM-AT | 10.97% | 6.195 | 5.252 | 88.72% |

CSSA-BiLSTM-TPA | 10.23% | 5.287 | 4.238 | 91.78% |

Besides, we compared the BiLSTM-AT model based on the traditional attention mechanism and the BiLSTM-TPA model based on the temporal pattern attention mechanism to verify the versatility and stability of the TPA mechanism in short-term power load prediction methods under the same conditions. After training, MAPE, RMSE, MAE, and R^{2} were used as the evaluation indicators of prediction accuracy. It was found that compared with the BiLSTM-AT model, the MAPE, RMSE, and MAE of the BiLSTM-TPA model were reduced by 1%, 0.465%, and 0.627%, respectively, and the R^{2} was improved by 1.98%. The above data again verify that the prediction accuracy of the TPA mechanism based on the BiLSTM model is higher than that of the traditional attention mechanism.

The CSSA-BiLSTM-AT model, SSA-BiLSTM-AT model, and BiLSTM-AT model are found CSSA-BiLSTM-AT model and the SSA-BiLSTM-AT model have significantly lower MAPE, RMSE, and MAE, and the R^{2} is significantly improved compared with the BiLSTM-AT model. This shows the sparrow search algorithm’s applicability and stability and improved algorithm. By comparing the prediction evaluation indicators of the CSSA-BiLSTM-AT model and the SSA-BiLSTM-AT model, it is further verified that the CSSA algorithm has higher optimization accuracy than the SSA algorithm. At last, Compared with the IPSO algorithm, the CSSA algorithm has better prediction results in this paper.

This paper proposes a short-term power load forecasting method based on the chaotic sparrow search algorithm to optimize the time-series attention mechanism BiLSTM model. 1) Through the grey relational analysis method, the internal relationship of the data on the forecast day is analyzed, and the data sets similar to the forecast day are selected to reduce the difficulty of network processing and the interference caused by the noise data. Combined with the BiLSTM model, the bidirectional gated recurrent unit is used to fully extract the time series features, which further im-proves the prediction accuracy of the input information; 2) The TPA mechanism can use the convolution kernel to extract important features in the hidden feature matrix of BiLSTM. At the same time, considering the influence of different variables on the predictor variables, select relevant information from different time steps for different features, which is more efficient than traditional attention. Mechanism models bring higher prediction accuracy; 3) The improved particle swarm optimization algorithm IPSO is introduced to compare with the CSSA algorithm, which proves that the CSSA algorithm has better prediction accuracy in the prediction process of this paper. Compared with other prediction models, the CSSA-BiLSTM-TPA model is more accurate in-network hyperparameter optimization and can effectively improve the accuracy of short-term power load prediction under different data sets. Compared with other prediction models, this paper’s proposed method has better predictive performance on short-term power load forecasting tasks.