Accurate load forecasting forms a crucial foundation for implementing household demand response plans and optimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations, a single prediction model is hard to capture temporal features effectively, resulting in diminished prediction accuracy. In this study, a hybrid deep learning framework that integrates attention mechanism, convolution neural network (CNN), improved chaotic particle swarm optimization (ICPSO), and long short-term memory (LSTM), is proposed for short-term household load forecasting. Firstly, the CNN model is employed to extract features from the original data, enhancing the quality of data features. Subsequently, the moving average method is used for data preprocessing, followed by the application of the LSTM network to predict the processed data. Moreover, the ICPSO algorithm is introduced to optimize the parameters of LSTM, aimed at boosting the model’s running speed and accuracy. Finally, the attention mechanism is employed to optimize the output value of LSTM, effectively addressing information loss in LSTM induced by lengthy sequences and further elevating prediction accuracy. According to the numerical analysis, the accuracy and effectiveness of the proposed hybrid model have been verified. It can explore data features adeptly, achieving superior prediction accuracy compared to other forecasting methods for the household load exhibiting significant fluctuations across different seasons.

Electricity load forecasting involves using statistics, machine learning, and other methodologies to predict future changes in load by analyzing existing electricity consumption data [

Currently, there exist various methods for load forecasting, typically categorized into traditional statistical methods and advanced machine learning techniques [

However, while the aforementioned methods are simple and practical, they place stringent demands on raw data processing and the stability of time series. Furthermore, their effectiveness in capturing nonlinear influencing factors is limited, making them more suitable for scenarios with fewer influencing factors. Modern machine learning methods exhibit proficiency in addressing nonlinear problems and offer greater advantages in short-term load forecasting performance. These methods encompass the expert system approach, support vector machine (SVM), and artificial neural network (ANN) [

In recent years, the deep learning method derived from ANNs has garnered significant attention due to its high data processing ability, making it a popular approach in the field of load forecasting [

Although the aforementioned studies have improved the accuracy of predictions through combined prediction methods, they have overlooked the influence of parameter settings in LSTM on prediction accuracy. In reality, the number of neurons and other parameters have a significant impact on prediction accuracy, and manually adjusting these parameters can easily miss the optimal combination. Fan et al. [

In this study, a short-term load forecasting model combining CNN-ICPSO-LSTM and attention mechanism is proposed. This method utilizes CNN to extract effective feature vectors from historical load sequences and, the LSTM network to model and learn the dynamic changes of these features. Following this, the attention mechanism is used to assign different probability weights to LSTM hidden states, thereby enhancing the influence of important information on load demand. Additionally, the parameters are optimized by the ICPSO algorithm for the LSTM network to further improve the prediction efficiency of the model. The model aims to analyze and process household electricity load data by combining multiple forecasting methods with complementary advantages, thereby achieving more accurate prediction values.

CNN is a deep learning model including convolutional computation and deep structure. It can learn representations and extract higher-order features from input information. By utilizing local connections and weight sharing, CNN processes the original data more deeply and abstractly, leading to more effective extraction of features.

The structure of CNN is shown in

The LSTM network is an improved version of the RNN model. It adds multiple gates, especially the setting of forget gates. These gate structures enable LSTM to process and remember longer sequences of time series data effectively. They allow the model to filter out irrelevant information from previous time steps, retaining crucial information while discarding less important details. This overcomes the issues of vanishing and exploding gradients typically encountered in traditional networks, and efficiently stores relevant information through the addition of memory units. The LSTM model exhibits strong generalization capabilities, displaying effective learning even when dealing with large or small datasets, and excels at solving nonlinear problems. The gate structure in the LSTM model enables the deletion or addition of information to the cell state. Each gate acts as an optional mechanism that controls the information flow, and its activation is primarily determined by the

The LSTM model consists of three gate structures: the forget gate, the input gate, and the output gate. The forget gate is primarily used to regulate the selection of memory information from the previous time step and the current input information. The memory unit utilizes the

The calculation details of LSTM are as follows. The forget gate can be described by

where

The input gate is shown in

The memory unit (information transmission)

The output gate

In

The attention mechanism is a resource allocation mechanism that simulates the human brain’s attention. It allows the model to focus its attention on the most important part of the input data while disregarding the unimportant parts. By calculating the relationship between the input and output of the LSTM hidden layer, the attention mechanism generates a weight vector that represents the importance of each input at the current moment. This weight vector is then used to compute the weighted input vector, which in turn generates the attention output. The core idea behind attention is to combine the output vectors of the LSTM with the vectors in the input sequence, enabling the model to prioritize important information in the input sequence. In the attention mechanism, each vector of the input sequence is assessed for similarity with the LSTM hidden layer output, generating a probability distribution that denotes the significance of each input. This probability distribution can be computed using the

The structure of the attention mechanism is illustrated in

In this study, the ICPSO algorithm is utilized to enhance the optimization speed of LSTM and further improve the model’s accuracy. This optimization is carried out during the training process of the prediction model. In each iteration of the ICPSO algorithm, superior particles are stored in the elite database. The fastest descent method is employed to quickly identify values that are close to the optimal solution, thereby preventing premature convergence of the algorithm. This approach aims to enhance the efficiency and effectiveness of the optimization process.

In this algorithm, the position of any particle

where

To prevent particles from clustering around local extreme values and getting trapped in local optima, the ICPSO algorithm maintains a balance between the global and local search capabilities of particles by adjusting the inertia weight.

where

The population fitness variance and chaotic perturbation strategy are integrated into the inertia weight transformation of the ICPSO algorithm. The population fitness variance, which represents the entropy between particles, is utilized to evaluate the level of particle agglomeration, which is shown as

where the initial value

The parameter optimization process of ICPSO-LSTM is illustrated in

(1) The experimental data is divided into training data, validation data, and testing data.

(2) The adaptive ICPSO algorithm is initialized, and the initial LSTM model is constructed based on the parameters associated with each particle in the algorithm. After defining the optimization objective, the model is trained using the training data, and the optimization results are assessed using the validation data. The fitness values of each particle are calculated as the average absolute percentage error of the prediction results. The objective function of the ICPSO algorithm is shown in

where

(3) Update the particle positions using the ICPSO algorithm, and store the updated optimal position values in the elite database. Take 30% of the total number of elite particles in the elite database and optimize these particles using the steepest descent method. The optimization results are then used to update the elite database again. If the iteration limit is reached, substitute the results into the LSTM model for prediction; otherwise, continue with the optimization process.

(4) Output the final optimization results.

The structure of the prediction model is depicted in

Each layer is described in detail as follows:

1) Input layer. The preprocessed historical load data is used as the input layer of the prediction model. The load data has a length of n and is preprocessed before being input into the model, which can be represented by

2) The CNN layer. The CNN layer is primarily responsible for extracting features from the input historical sequence. The CNN framework consists of two one-dimensional convolution layers, two maximum pooling layers, and one fully connected layer. To accommodate the characteristics of the load data, convolution layers 1 and 2 are designed as one-dimensional convolutions and employ the

where

3) ICPSO-LSTM layer. The ICPSO-LSTM layer is utilized to learn the feature vectors extracted by the CNN layer. The model employs a single-layer LSTM structure to perform deep learning on the extracted feature vector to capture its internal variation pattern. Furthermore, the ICPSO optimization algorithm is employed to optimize the parameters of the LSTM. The output of the ICPSO-LSTM layer is denoted as

4) Attention layer. The input to the attention mechanism layer is the activated output vector

where

5) Output layer. The input to the output layer is the output of the attention mechanism layer. The output layer calculates the output

where

The flowchart of the forecasting model is shown in

(1) Data preprocessing: This step involves various tasks, especially noise reduction. In this study, the electricity load data has periodic similarity along the time axis and no continuous mutations, thus the moving average method has been employed for noise reduction. It smooths the data curve by calculating the average value of the data over some time, thereby reducing the impact of noise.

(2) Model training: The data is divided into a training set and a testing set. The training set is used to train the model. The data from the training set is inputted into the model, where the CNN layer performs feature extraction, and the LSTM layer learns the extracted feature vectors. Additionally, the ICPSO algorithm is employed to find optimal parameters for LSTM, thereby enhancing the training speed.

(3) Forecast result output: The attention mechanism determines the weight values for output, and an error analysis is conducted before outputting the load prediction value.

The load dataset of a household in Shanghai from January to December in a specific year has been investigated by the local utility company. The dataset consists of 96 data points per day, collected at 15-minute intervals. A subset of the data is chosen for model training and forecasting. Based on the moving average method, the raw data is denoised to smooth the data and prevent the presence of singular points from affecting load forecasting. The specific operation process is as follows:

(1) Calculate the historical average value of the load, as shown in

(2) According to principle

where

(3) If

where

Moreover, to facilitate the training of the model network, the min-max normalization method is applied to normalize the original data within the range of (−1, 1), using the following calculation formula:

where

To assess the accuracy of the model’s predictions, the mean absolute percentage error (MAPE) and root mean square error (RMSE) are employed as evaluation criteria, as shown in

where

To validate the superiority and reliability of the proposed model for short-term load forecasting, six groups of comparison models are established, including (1) the LSTM method; (2) the CNN-GRU combined prediction method without incorporating the attention mechanism; (3) the CNN-LSTM combined prediction method without incorporating the attention mechanism; (4) CNN-PSO-GRU combined prediction method based on the attention mechanism; (5) CNN-PSO-LSTM combined prediction method based on the attention mechanism; (6) CNN-ICPSO-LSTM combined prediction method based on the attention mechanism; (7) CNN-PSO-BiLSTM combined prediction method based on the attention mechanism; (8) CNN-ICPSO-BiLSTM combined prediction method based on the attention mechanism.

In this study, all computational programs have been run on a laptop computer configured with an Intel Core i7-13700HX processor and 16 G RAM.

After analyzing the initial load data, it is evident that the household load exhibits clear periodic changes corresponding to seasonal variations, with significant fluctuations during summer and winter. Therefore, the household load data in summer and winter are chosen for prediction and comparison.

To verify the scientificity and stability of the forecasting models, a random week from each season of the dataset is chosen for daily load forecasting. The results of single-day load forecasting may not directly indicate the stability of the forecasting model. Therefore, the daily load forecasting results for seven days per week are analyzed from an average standpoint. The performance indicators of different forecasting models are presented in

Date | Evaluation criteria | LSTM | CNN-GRU | CNN-LSTM | CNN-PSO-GRU | CNN-PSO-LSTM | CNN-ICPSO-LSTM | CNN-PSO-BiLSTM | CNN-ICPSO-BiLSTM |
---|---|---|---|---|---|---|---|---|---|

3/09– | MAPE/% | 9.209 | 7.910 | 7.648 | 3.274 | 6.726 | 2.353 | 2.574 | 2.311 |

3/15 | RMSE/kW | 0.00983 | 0.00846 | 0.00818 | 0.00285 | 0.00305 | 0.00260 | 0.00277 | 0.00168 |

6/15– | MAPE/% | 23.020 | 22.136 | 21.837 | 10.448 | 12.943 | 9.989 | 10.961 | 9.126 |

6/21 | RMSE/kW | 0.00930 | 0.00850 | 0.00820 | 0.00463 | 0.00597 | 0.00420 | 0.00418 | 0.00363 |

9/14– | MAPE/% | 27.281 | 22.654 | 21.448 | 9.558 | 7.6041 | 6.950 | 7.594 | 6.596 |

9/20 | RMSE/kW | 0.00990 | 0.00854 | 0.00817 | 0.00445 | 0.00401 | 0.00390 | 0.00421 | 0.00379 |

12/14–12/20 | MAPE/% |
8.955 |
7.656 |
7.413 |
3.531 |
6.025 |
2.190 |
3.906 |
2.178 |

In terms of the error comparison,

Moreover, based on the results presented in

In this study, the proposed CNN-ICPSO-LSTM model utilizes CNN for feature extraction tasks, ensuring the retention of crucial data features, while LSTM is employed to capture the interdependencies among the data. In situations where there are significant load fluctuations, the coupling characteristics among features are leveraged to mitigate prediction errors. Additionally, ICPSO is employed to optimize the LSTM parameters, enabling the identification of the optimal parameter combination and improving the operational efficiency of the model. The attention mechanism further enhances the prediction accuracy by assigning weights that highlight the influence of important features.

To conduct a comprehensive comparison, a typical day in both summer and winter, characterized by significant load fluctuations, is selected to compare the load prediction results using different forecasting methods, as illustrated in

(1) LSTM exhibits a lower fitting degree in comparison to other methods. It is challenging for a single forecasting method to accurately predict load data with substantial fluctuations, and the forecasting results for time series with distinctive characteristics are subpar. Nonetheless, the overall trend of the curve remains relatively consistent with the true values.

(2) There is a minimal disparity between CNN-LSTM and CNN-GRU, both of which exhibit improved fitting degrees and a closer alignment with the true values in terms of the overall trend. This suggests that combined prediction yields higher prediction accuracy compared to individual predictions. Additionally, load forecasting with distinct temporal characteristics outperforms LSTM.

(3) The inclusion of the PSO algorithm improves the prediction accuracy of the original model. Moreover, the application of the ICPSO algorithm makes the above advantages more obvious.

(4) The load trend can be adequately captured by each forecasting model within regions where the load change is relatively steady. However, significant discrepancies between the predicted and actual values are evident in areas characterized by more drastic load changes, particularly in the vicinity of load peaks and troughs.

Generally, in comparison to the other methods, the load forecasting approach proposed in this study demonstrates significantly enhanced fitting degrees. It delivers more accurate predictions for load data with evident time series characteristics, resulting in smaller errors. Moreover, it delivers more accurate predictions for load data with evident time series characteristics, resulting in smaller errors. Moreover, it demonstrates good prediction performance for all typical days with significant different load profiles, demonstrating its satisfied robustness.

In addition, it is interesting to notice that, the predicted values using most models are lower than the true value. It is considered to be a coincidence using specific input data. There is no clear theoretical basis to support the above phenomenon. In addition, it is found that the forecasting accuracy in winter is better than that in summer. This may be due to the greater impact of temperature and humidity on summer loads, resulting in stronger load fluctuations than in winter. This fluctuation increases the difficulty of load forecasting and reduces the prediction accuracy.

Taking the load prediction results on a typical summer day as an example, the errors and training times of different models are compared and analyzed, as presented in

Prediction model | MAPE/% | RMSE/kW | Training time (s) |
---|---|---|---|

LSTM | 25.487 | 0.00982 | 492 |

CNN-GRU | 21.837 | 0.00844 | 892 |

CNN-LSTM | 21.136 | 0.00817 | 970 |

CNN-PSO-GRU | 10.448 | 0.00413 | 952 |

CNN-PSO-LSTM | 12.943 | 0.00597 | 1165 |

CNN-ICPSO-LSTM | 9.989 | 0.00392 | 766 |

CNN-PSO-BiLSTM | 10.961 | 0.00418 | 1513 |

CNN-ICPSO-BiLSTM | 9.126 | 0.00363 | 1332 |

To provide a clearer explanation of the error results for each model,

As shown in

The aforementioned analysis confirms that the proposed model exhibits higher accuracy and advantages when dealing the load data with significant fluctuations in different seasons. Additionally, long-term time series prediction results play a vital role in household load prediction. To compare and validate the performance of the six models, load data from three days in both summer and winter are employed. The prediction results of different forecasting models are illustrated in

When extending the forecasting period to 3 days, the proposed method demonstrates a closer alignment with the actual load trend and accurately predicts load fluctuations. It exhibits superior performance during peak and valley periods with significant load variations, accurately analyzing changes in load data at peak and valley values, and showcasing a high level of fitting with the actual load values. As a result, more precise prediction results are achieved.

Scenario | Evaluation criteria | LSTM | CNN-GRU | CNN-LSTM | CNN-PSO-GRU | CNN-PSO-LSTM | CNN-ICPSO-LSTM | CNN-PSO-BiLSTM | CNN-ICPSO-BiLSTM |
---|---|---|---|---|---|---|---|---|---|

Summer single- | MAPE/% | 25.487 | 21.837 | 21.136 | 10.448 | 12.943 | 9.989 | 10.961 | 9.126 |

day forecast | RMSE/kW | 0.00982 | 0.00844 | 0.00817 | 0.00413 | 0.00597 | 0.00392 | 0.00418 | 0.00363 |

Summer three- | MAPE/% | 26.788 | 22.994 | 22.255 | 11.692 | 10.656 | 10.164 | 9.661 | 8.732 |

day forecast | RMSE/kW | 0.00983 | 0.00845 | 0.00818 | 0.00582 | 0.00460 | 0.00256 | 0.00316 | 0.00244 |

Winter single- | MAPE/% | 17.810 | 14.240 | 9.401 | 3.531 | 6.025 | 2.190 | 3.906 | 3.661 |

day forecast | RMSE/kW | 0.00982 | 0.00845 | 0.00816 | 0.00495 | 0.00830 | 0.00255 | 0.00841 | 0.00322 |

Winter three- | MAPE/% | 18.053 | 14.351 | 9.677 | 8.1382 | 11.692 | 2.254 | 6.661 | 2.213 |

day forecast | RMSE/kW | 0.00982 | 0.00846 | 0.00817 | 0.00884 | 0.0936 | 0.00259 | 0.00881 | 0.00246 |

Based on the aforementioned discussion, the model proposed in this study not only ensures prediction accuracy but also maintains prediction stability. Consequently, it exhibits a strong fitting ability to the actual values. In addition, the model demonstrates high accuracy in both daily and long-term time series prediction, further validating its robustness.

To address the challenges posed by the large volume of data for household load forecasting, a hybrid deep learning framework integrating the CNN-ICPSO-LSTM model and the attention mechanism, is proposed for short-term electricity load forecasting. According to the simulation results, the following conclusions can be drawn:

(1) The prediction method proposed in this study not only exhibits significantly lower errors in terms of MAPE and RMSE compared to other prediction methods but also demonstrates a higher degree of curve fitting to the true values in the prediction results. These findings indicate that the proposed method can enhance the accuracy of short-term load forecasting.

(2) The hybrid model exhibits a relatively high level of complexity, resulting in an increased running time compared to a single prediction model. However, compared to the CNN-PSO-GRU and CNN-PSO-LSTM models, the proposed model reduces the overall running time by employing ICPSO to optimize LSTM parameters. This indicates that the proposed method also enhances the training speed of the model.

(3) After increasing the prediction duration, all methods experience an increase in error. However, the method proposed in this study exhibits lower error growth rate, suggesting its capability to predict longer time series with high accuracy.

None.

The research work was supported by the Shanghai Rising-Star Program (No. 22QA1403900), the National Natural Science Foundation of China (No. 71804106), and the Non-carbon Energy Conversion and Utilization Institute under the Shanghai Class IV Peak Disciplinary Development Program.

The authors confirm contribution to the paper as follows: study conception and design: L. Ma, L. Wang, and H. Ren; data collection: S. Zeng and Y. Zhao; analysis and interpretation of results: C. Liu and H. Zhang; draft manuscript preparation: L. Ma, L. Wang and Q. Wu. All authors reviewed the results and approved the final version of the manuscript.

All data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

The authors declare that they have no conflicts of interest to report regarding the present study.