This paper proposes a spatio-temporal model (VCGA) based on variational mode decomposition (VMD) and attention mechanism. The proposed prediction model combines a squeeze-and-excitation network to extract spatial features and a gated recurrent unit to capture temporal dependencies. Primarily, the VMD can reduce the instability of the original wind speed data and the attention mechanism functions to strengthen the impact of important information. In addition, the VMD and attention mechanism act to avoid a decline in prediction accuracy. Finally, the VCGA trains the decomposition result and derives the final results after merging the prediction result of each component. Contrasting experiments for short-term prediction on the actual wind power dataset prove that VCGA is superior to prior algorithms.

Currently, there is a growing need to utilize renewable energy to solve future energy shortages. Thus, new energy systems are replacing many traditional power generation systems. As one of the most potential, abundant, and environmentally renewable resources, wind energy has gained enormous attention from governments and enterprises worldwide [

The accurate prediction of short-term wind speed is essential to the operation and control of wind power systems. It aids the appropriate sitting of wind power grid connection, reduces voltage and frequency fluctuation caused by wind power variation, and improves the reliability of power grid operation [

However, due to wind power’s intermittency, volatility, and uncertain nature, the aforementioned methods are usually combined with specific processing methods to obtain relatively stable subsequences in practical applications. The variational mode decomposition (VMD) method [

In recent years, there have been remarkable achievements in wind speed prediction. Beyond the usual temporal correlation, spatial correlations, an essential feature of wind speed, have gained considerable research attention. Consequently, spatial and temporal correlation analysis has become a research hotspot [

The main contributions of this paper are as follows.

Employing the VMD method to process the wind speed data. Consequently, the unstable wind speed sequence is transformed into a relatively stable subsequence to improve the wind speed prediction accuracy.

Given the irrelevant features in the data that will lead to the decline of model performance, it is necessary to redistribute the feature weights and improve the model performance through the attention mechanism.

The underlying architecture of VCGA is composed of CNN and GRU. This model can deal with temporal and spatial characteristics of wind speed and employ spatio-temporal correlations in wind speed prediction, enhancing the prediction accuracy.

The remainder of the paper is organized as follows. Section 2 gives the basic principles of VMD and the attention mechanism and background theories about CNN and GRU; Section 3 introduces the spatio-temporal data model of wind speed used in VCGA, the hybrid deep learning framework, and how to integrate the attention mechanism in this framework skillfully. Section 4 is the experimental part, comparing and analyzing with relevant algorithms, proving the superiority of the new algorithm. Finally, Section 5 summarizes the whole paper and points out the direction of further research.

In 2004, K. Dragomiretskiy and D. Zosso proposed VMD, an adaptive, quasi-orthogonal, and completely non-recursive decomposition method [

To solve this variational problem, the alternate direction method of multipliers [

The essence of VMD is a variational problem, which mainly includes the construction and solution of the variational problem, and its process is as follows.

First, we need to construct the variational problem. Assuming that decomposing the original signal F into K components, each mode has a limited bandwidth of a central frequency and minimizes the sum of the estimated bandwidths of each modal. The preprocessed space-time data of wind speed is

In _{k}} and {_{k}} correspond to the

Therefore, to solve

The parameters {_{k}}, {_{k}}, and λ are iteratively updated by the alternate direction method of the multiplier. The formula is as follows:

In

Finally, _{k}, can be obtained.

CNN adopts the method of local connection and weight sharing to process the original data at a higher and more abstract level, which effectively and automatically extracts the internal features of data [

LSTM and GRU are both variants of the RNN [

In _{t}. _{t} and _{t} represent the update gate and reset gate, respectively. _{t} is the input and _{t} is the output of the hidden layer.

_{t}._{1}, _{2}, ⋅ ⋅ ⋅ ⋅ ⋅ , _{t}] is the output of the fully connected layer, _{t−1} and the current input _{t}, and

The attention mechanism [

_{n} represents the input of the network, _{n} corresponds to the output of the hidden layer generated by each input through the network, and _{t} represents the probability distribution value of the attention mechanism for the output of the hidden layer.

The attention mechanism is typically used in RNN architectures to improve the model performance and has contributed massively to time series prediction. For example, the research in [

First, the VCGA algorithm would preprocess the original data, i.e., to acquire the original spatio-temporal wind speed series of the target site. Second, this algorithm applies the VMD method to decompose it and gets the IMF components for processed data. Then, the attention mechanism is employed to improve the CNN-GRU network. Finally, the predicted values of each IMF component are acquired and superimposed on the predicted values to obtain the final value through this new network.

Two functional modules make up the new hybrid network model. The two modules introduce the attention mechanism to optimize the performance. One module takes the CNN model as the core and blends in an SE block, whose purpose is to extract the spatial characteristics of wind speed. The other module selects the GRU model to capture temporal dependency and adds an attention layer to avoid information loss in time series.

Wind speed data usually have both temporal correlation and spatial correlations. For one, the temporal correlation of wind speed data implies that the wind speed at a given location is related to temporal variation. Conversely, the spatial correlation indicates that the wind speed at different areas within a specific geographical scale is not independent of statistics. Even though the wind speed at other sites is different, wind speed data’s temporal and spatial functions are continuous [

One difficulty is retaining the spatio-temporal correlation of data without increasing the amount of data. Dimension reduction is an efficient solution [_{t} ∈ ^{M×N}(1 ≤ _{t} ∈ ^{M×N} can be defined as

Suppose that

This matrix approach is referred to and subsequently improved by VMD. Meanwhile, assuming that the time window length is T and the IMF component number is K.

If at a time t, we can denote the value of the site (_{k} component as

It is important to note that a single SWSM does not involve any time information because all its elements are observed simultaneously in a single SWSM. Therefore, by organizing SWSMs in chronological order, we can construct a spatio-temporal sequence describing the array’s wind speed. As shown in

By analyzing the SWSM, the wind prediction model needs a spatial model to extract spatial features and a time model to obtain temporal correlation. Combined with the improvement of previous models, we propose the VCGA model to make a more accurate prediction. The model is mainly divided into the input, SENet, GRU, Attention, and output layers.

The input data of the input layer is the spatial data of the IMF component obtained by VMD decomposition. Then squeeze-and-excitation networks (SENet) [_{S} = [_{S1}, _{S2}, …, _{Si}], and the length of the SENet layer is

Here is how to get _{S} by SENet layer.

In _{c} represents the c-th convolution kernel, _{1}_{1} is _{2} is also a full-connection process, and the dimension of _{2} is _{c} is the weight reflecting the importance of each feature channel, and the weight coefficients of each channel can be learned through _{c} ⋅ _{c}.

In the VCGA algorithm, GRU is adopted as the superstructure of a hybrid deep learning framework to receive spatial features extracted from SENet. A single-layer GRU structure was constructed to fully learn the proposed features to capture their internal variation rules. Simultaneously, the Dropout method [

The input of the Attention layer is the output vector _{t}, which is activated by the GRU network layer. The optimal weight parameter matrix is constantly updated, iterated, and calculates the corresponding probabilities of different feature vectors according to the weight allocation principle. The calculation formula of the weight coefficient of the attention mechanism layer can be expressed as_{t} represents the attention probability distribution value determined by the output vector _{t} of GRU network layer at time t; _{t}.

Finally, the input of the output layer is the output of the Attention layer. Then, the output _{1}, _{2}, ⋅ ⋅ ⋅ ⋅ ⋅, _{m}] ^{T} with the prediction step of

In _{t} represents the predicted output value at time t; _{o} is the weight matrix; _{o} is the deviation vector, and the Sigmoid function was selected as the activation function of the Dense layer.

The VCGA model selects the Adam (adaptive moment estimation) [

In _{i} is the actual value of; and

The dataset used in this paper is from the Wind Integration National Dataset provided by the National Renewable Energy Laboratory. We collected the selected wind speed data at an interval of 5 min for a 10 × 10 wind turbine array in a wind farm in Wyoming, USA, measured in 2012. Then, we reset the time interval to 10 min for prediction. There are 52560 data in the dataset, among which the highest wind speed is 35.48 m/s and the lowest wind speed is 0.01 m/s. The training set, validation set, and test set are the first 60%, the following 10%, and the last 30% of the data, respectively.

In the experiment, the GPU server is configured as NVIDIA GeForce RTX 2080 Ti, 11G video memory, 11G E5–2678, 24-core CPU, 440GB SSD, and 4TB hard disk. The development environment combines frameworks, including tensorflow2.4, keras2.4.2, and python 3.7.

Index | Type | Configurations |
---|---|---|

1 | Convolution layer | kernels: 20; kernel size: 3 × 3; stride: 1 × 1 |

2 | Max-pooling layer | Pooling size: 2 × 2; stride: 2 × 2 |

3 | SENet layer | fitter: 50; ratio: 0.5 |

4 | Convolution layer | kernels: 200; kernel size: 2 × 2; stride: 1 × 1 |

5 | Fully connected layer | units: 20 |

6 | GRU layer | Hidden units: 200 |

For wind speed prediction, we choose root mean square error (RMSE) [

In _{i} is the actual value.

To verify the superiority of the proposed VCGA, we compared the VCGA algorithm with similar algorithms for processing spatio-temporal data, including the CNN-GRU algorithm, VMD-CNN-GRU algorithm (VCG) and CNN-GRU-Attention algorithm (CGA) [

Model | Prediction horizon (min) | |||
---|---|---|---|---|

20 | 30 | 60 | 120 | |

LSTM | 1.107 | 1.396 | 1.964 | 3.160 |

GRU | 1.013 | 1.317 | 1.942 | 3.016 |

CNN-GRU |
0.874 |
1.246 |
1.781 |
2.574 |

VMD-CNN-GRU | 0.860 | 1.167 | 1.683 | 2.182 |

VCGA | 0.804 | 1.084 | 1.465 | 2.049 |

Model | Prediction horizon (min) | |||
---|---|---|---|---|

20 | 30 | 60 | 120 | |

LSTM | 14.851 | 17.234 | 26.747 | 42.123 |

GRU | 12.160 | 17.347 | 25.496 | 42.890 |

CNN-GRU |
9.421 |
14.088 |
24.617 |
32.867 |

VMD-CNN-GRU | 10.210 | 14.843 | 20.436 | 30.132 |

VCGA | 9.690 | 12.689 | 19.219 | 28.348 |

First, the temporal algorithms LSTM and GRU are compared. For time intervals of 20, 30, 60, and 120 min, the RMSE of GRU is 8%, 6%, 2%, and 5% lower than LSTM, respectively, with an average decrease of 5.25%. When the prediction time interval is 20 and 60 min, the MAPE of GRU is lower than LSTM by 28% and 5%, respectively; When the prediction time interval is 30 and 120 min, GRU increases the MAPE by 1% and 2.8%, respectively, compared to LSTM. Thus, we can know that the performance of GRU is better than LSTM most of the time in this wind prediction. This is why we choose GRU instead of LSTM to process time-domain data. In addition, the prediction performance of the spatio-temporal model is significantly better than that of the temporal model according to the values of RMSE and MAPE in

To describe the performance of VMD more intuitively, we should compare the RMSE and MAPE of the VCG model with the CNN-GRU model. For all time intervals, the RMSE of the VCG was lower than that of CNN-GRU, with gaps reaching 2%, 7%, 9%, and 15%, respectively, with an average of 8.3%. Concerning the prediction error MAPE, the performance of the CNN-GRU model is inferior to that of the VCG at the interval of 20 and 30 min. Compared with the CNN-GRU model, the prediction errors increased by 7.8% and 5.1%, respectively. However, at 60 min horizon and 120 min horizon, VCG’s MAPE is 12.5% lower than that of the CNN-GRU model on average. Similarly, at 120 min prediction horizon, VCGA has a 6.7% and 7.1% improvement for aspects of RMSE and MAPE compared to the CGA model of the algorithm that does not use the VMD. It can be seen that VMD decomposition can better eliminate the randomness and unsteadiness of wind speed for short-term wind speed prediction with a longtime interval to obtain better prediction results.

Furthermore, compared with the CNN-GRU model that does not combine the attentional mechanism, when the prediction horizon is 60 or 120 min, CGA reduces the RMSE and MAPE by an average of 9.9% and 10.4%, respectively. This comparison can verify the effect of the attention mechanism in short-term wind speed prediction. Accordingly, the RMSE and MAPE values of VCGA are much lower than those of other algorithms, including VCG. It proves that using an attention mechanism can significantly improve the short-term wind speed prediction and enhance the prediction accuracy, indicating that the method in this paper has better prediction performance and specific application potential.

From

For better intuitive inspecting, the differences between VCGA and other algorithms,

This paper proposes a VCGA algorithm for short-term wind speed prediction. This algorithm uses VMD to stabilize the wind speed series to obtain IMF components. The included attention mechanism aims to reduce the computational burden and better extract features. The SENet layer extracts the spatial characteristics from IMF components. Then, the GRU network connected with the attention mechanism is used to extract time-domain features. Finally, wind speed predictions are obtained after merging spatial and time-domain features.

The simulation results show that VCGA can fully explore the spatio-temporal characteristics of wind speed series and effectively improve the accuracy of short-term wind speed prediction. It also has vast application potential. However, more factors that may affect wind prediction, such as temperature, humidity, and altitude, will be considered in future studies. Thus, the model developed here will be modified to obtain more accurate prediction results. In addition, we will also research how to introduce more excellent algorithms, such as BiGRU, into the model for adapting to the wind speed prediction environment and improving forecasting performance.

This paper is supported by the undergraduate training program for innovation and entrepreneurship of NUIST (XJDC202110300239).