An operating condition recognition approach of wind turbine spindle is proposed based on supervisory control and data acquisition (SCADA) normal data drive. Firstly, the SCADA raw data of wind turbine under full working conditions are cleaned and feature extracted. Then the spindle speed is employed as the output parameter, and the single and combined normal behavior model of the wind turbine spindle is constructed sequentially with the pre-processed data, with the evaluation indexes selected as the optimal model. Finally, calculating the spindle operation status index according to the sliding window principle, ascertaining the threshold value for identifying the abnormal spindle operation status by the hypothesis of small probability event, analyzing the 2.5 MW wind turbine SCADA data from a domestic wind field as a sample, The results show that the fault warning time of the early warning model is 5.7 h ahead of the actual fault occurrence time, as well as the identification and early warning of abnormal wind turbine spindle operation without abnormal data or a priori knowledge of related faults.

In recent years, the rapid expansion of wind power has assumed an increasingly prominent position in the energy structure industry [

The approaches of wind turbine operational condition monitoring include vibration signal monitoring [

Therefore, this work proposes a research method of wind turbine spindle operation status recognition and early warning driven by SCADA data, which features status identification based entirely on normally operating SCADA data without using any failure data. With the analysis of the SCADA data from a domestic 2.5 MW direct drive wind turbine, the spindle speed is selected as the output parameter for spindle operation status recognition, which focuses on extracting the relevant feature parameters of spindle speed and establishing the prediction model to implement the spindle operation status recognition and failure warning. In this work,

With a domestic wind field F24 wind turbine as the research object, and the total 54607 sets of data from 0:00 on July 01, 2018 to 0:00 on July 01, 2019 selected as the original research data. The SCADA original research data format is shown in

No. | Time | Power/(Kw) | Wind speed/(m·s^{−1}) |
Spindle speed/(r·min^{−1}) |
Air temperature/(°C) | … |
---|---|---|---|---|---|---|

1 | 00:00 | 1257 | 8.4083 | 14.7507 | 25.6158 | … |

2 | 00:10 | 1335.63 | 8.4203 | 14.7547 | 25.4162 | … |

3 | 00:20 | 1199.71 | 8.1536 | 14.7294 | 25.4131 | … |

… | … | … | … | … | … | … |

In accordance with the wind speed-power characteristic curve and the distribution characteristics of abnormal data, QM-DBSCAN method is utilized to exclude the abnormal data induced by extreme weather, component failure, wind curtailment. QM-DBSCAN is a methodology for identifying and cleaning wind speed-power data. It combines the quartile method (QM) and the density-based spatial clustering of applications with noise (DBSCAN) to carry out the differentiated cleaning of abnormal data according to the category and characteristics of wind speed-power data clusters. The cleaning effect is shown in

As shown in

SCADA system captures operating data for 10 min, which mainly comprises monitoring parameters of wind speed, output power and spindle speed, and these parameters have different dimensions and dimensional units. For eliminating the influence of dimensions between parameters, it is necessary to normalize the monitoring data after cleaning. The formula is as follows:

In

No. | Time | Power | Wind speed | Spindle speed | Air temperature | … |
---|---|---|---|---|---|---|

1 | 00:00 | 0.5028 | 0.5334 | 0.7974 | 0.7585 | … |

2 | 00:10 | 0.5342 | 0.5346 | 0.7979 | 0.7522 | … |

3 | 00:20 | 0.4799 | 0.5083 | 0.7950 | 0.7521 | … |

… | … | … | … | … | … | … |

Aiming at the characteristics of large volume, high dimensions and strong redundancy of the data collected by wind turbine SCADA system, the feature extraction of the normalized SCADA data is carried out to eliminate irrelevant parameters, proposing a feature selection approach combining Spearman correlation coefficient and Principal Component Analysis (PCA). The spindle speed indicates the operating state of the spindle, hence the spindle speed is adopted as the output parameter to extract the operating parameters related to the spindle speed, where the Spearman correlation coefficient is as follows:

In

Since the operating parameters have a varying influence on the spindle speed, the correlation coefficient between the output parameter spindle speed and the remaining 49 monitoring parameters are calculated by using the method of Spearman correlation coefficient, and the monitoring parameters with correlation coefficients above 0.6 with spindle speed are selected as the initial characteristic parameters, the ranking results are summarized in

Feature name | Correlation coefficient | Feature name | Correlation coefficient |
---|---|---|---|

Wind speed | 0.8784 | Generator stator V-phase temperature | 0.7325 |

Maximum wind speed | 0.8497 | Generator stator U-phase temperature | 0.6983 |

Maximum power | 0.8208 | Maximum vibration 2B | 0.6676 |

Maximum power factor… | 0.8005 | Maximum vibration 1A | 0.6653 |

Minimum power | 0.7861 | Minimum power | 0.6605 |

Minimum wind speed | 0.7513 | Maximum vibration SSD | 0.6463 |

Minimum power factor | 0.7501 |

PCA is available within the feature extraction to retrieve further new variables that reflect the original information of all variables and to eliminate redundant information among monitoring data to enhance the prediction accuracy of the model. The feature value, contribution rate and cumulative contribution rate corresponding to each principal component as shown in

No. | Feature value | Contribution rate (%) | Cumulative contribution rate (%) |
---|---|---|---|

Comp.1 | 1.4830 | 71.0989 | 71.0989 |

Comp.2 | 0.7138 | 16.4691 | 87.5680 |

Comp.3 | 0.4158 | 5.5900 | 93.1581 |

Comp.4 | 0.2357 | 1.7959 | 94.9540 |

Comp.5 | 0.2117 | 1.4490 | 96.4029 |

Comp.6 | 0.1753 | 0.9939 | 97.3969 |

Comp.7 | 0.1572 | 0.7985 | 98.1953 |

Extreme Learning Machine (ELM) is a novel single-hidden-layer feed-forward neural network learning algorithm [

The output function of the hidden layer is defined as follows:

Support Vector Machine (SVM) is a machine learning method with supervision features that solve classification and regression problems [

The SVR learning objective is to find the optimal hyperplane closest to all points at a given interval,

Elman neural network is a recurrent neural network with local memory units and local feedback connections, which has more capability to deal with dynamically changing data [

At time

Combined prediction models can effectively minimize the effectiveness of the random factors of the singular prediction models, comprehensive the singular prediction models to further improving the accuracy of prediction, Assuming that a prediction problem has

(1) Solving the relative error

(2) Determining the weight of the relative error

(3) Calculating the entropy value of the relative error

(4) Solving for the weights of each single prediction model

A total of 42301 sets of normal SCADA data with seven characteristic parameters obtained after data preprocessing as the input parameters of each prediction model carried out the construction work of wind turbine spindle speed prediction models, by loading elmNNRcpp, E1071, RSNNS through R language platform to construct ELM, SVR, and Elman prediction models. The specific step-by-step flow of each model establishment as shown in

The optimal parameters of each model trained by the training set data are shown in

Prediction models | Parameter name | Parameter setting |
---|---|---|

ELM | nid | 1000 |

actfun | tribas | |

Init_weights | Uniform_negative | |

Elman | size | 8 |

maxit | 1000 | |

learnFuncParams | c(0.1) | |

SVR | cost | 5000 |

gamma | 0.0005 |

After determining the optimal parameters of each model, the ELM, SVR, and Elman prediction model weights in the combined model are calculated by

With the mean absolute percentage error (^{2}) as evaluation indicators, the prediction models were evaluated quantitatively to select the highest accuracy prediction model with the following formulae:

In ^{2} indicates the reliability of the spindle speed variation with the value of [0,1]. The closer the ^{2} is to 1, then it indicates that the better interpretation of the input variables on the spindle speed, and the higher accuracy of the model prediction. Since ^{2} has an exact range of values, it is calculated by ^{2} of ELM, SVR, Elman and combined prediction models are 0.9957, 0.9934, 0.9965 and 0.9972, respectively. With ^{2} of each prediction model exceeding 0.99, which indicates that over 99% of the spindle speed can be determined by the seven characteristic parameters, the prediction accuracy of all four models has high accuracy and favorable reliability.

Due to the strong randomness of wind speed, temperature and other environmental factors during the operation of wind turbines, for avoiding the false alarms caused by larger instantaneous error, the sliding window model is adopted to process the data, the window width is set to

It is known that the operation state index _{th}] is regarded as a confidence interval with confidence for the operation state index _{th} is expressed as

In accordance with the small probability principle, with _{th}, the wind turbine spindle operation status is evaluated to be abnormal, and the confidence upper limit is regarded as the threshold value here. After establishing the spindle speed prediction model by the normal SCADA data, according to the principle of spindle operation status identification, the sliding window width _{th} for three times in a row or more.

This work is implemented to recognize the spindle speed status of a domestic wind farm 2.5 MW wind turbine with a cut-in wind speed of 3 m/s, a rated wind speed of 10.7 m/s, a cut-out wind speed of 25 m/s, and a rated power of 2500 KW using one year of SCADA historical monitoring data of the wind turbine.

By establishing an effective spindle speed normal behavior data-driven model to obtain the residuals that reflect the spindle operation status, the spindle speed ELM, SVR, Elman and Combined normal behavior models are respectively established with R language software in conjunction with the normal data obtained after pre-processing, and the optimal spindle speed model is determined by evaluating indexes ^{2} as follows:

(1) Dividing the 42301 sets of normal data by 10:1 to obtain the training set, test set.

(2) Establishing the spindle speed prediction model with R language, determining the initial parameter range of each model, through

As shown in

(3) The entropy value methodology was employed to determine the weights of the single prediction models in the combined prediction model, and the predictive efficiency of the combined model as shown in

As can be seen from

(4) Adopting the evaluation indexes ^{2} to evaluate the models, with the comparison of the evaluation indexes of each model shown in

Prediction model | ^{2} |
|||
---|---|---|---|---|

ELM | 0.4704 | 0.0212 | 0.0140 | 99.5749% |

SVR | 0.6825 | 0.0241 | 0.0197 | 99.3358% |

Elman | 0.4665 | 0.0174 | 0.0131 | 99.6529% |

Combined model | 0.4568 | 0.0157 | 0.0105 | 99.7171% |

As shown in ^{2} improved by 0.1422%, 0.3813% and 0.0642%, compared with the singular model.

(5) The combination model is determined as the optimal prediction model.

Set the sliding window width _{th} as 0.5660. The SCADA monitoring data of the unit for 10 days before the occurrence of fault is selected as the abnormal condition test data, and the SCADA monitoring data of the unit for 10 days of normal operation is selected as the normal condition test data. After normalizing the test data into the combined model for prediction, when

As shown in

In order to address the difficulty of collecting fault data from in-service wind turbine SCADA system, this work proposed a research approach for wind turbine spindle operation status driven entirely by SCADA normal data. At first, the QM-DBSCAN process is employed to clean the SCADA raw data and extract the feature parameters with the spindle speed as the output parameter. The next, ELM, SVR, Elman and combined prediction models were constructed based on the R language platform, and the combined prediction model was obtained as optimal with evaluation indexes ^{2}, whereby ^{2} increased separately by 0.1422%, 0.3813% and 0.0642% from the single model. At the last, the spindle operation status early warning threshold is computed and verified with actual SCADA data based on the assumption of small probability events. Such system can detect potential failures and alert in advance, which has implications for wind turbine spindle running condition recognition and maintenance.

Supervisory Control and Data Acquisition

Principal Component Analysis

Extreme Learning Machine

Support Vector Machine Regression

This work was supported by the

The authors declare that they have no conflicts of interest to report regarding the present study.