Electricity demand is also known as load in electric power system. This article presents a Long-Term Load Forecasting (LTLF) approach for Malaysia. An Artificial Neural Network (ANN) of 5-layer Multi-Layered Perceptron (MLP) structure has been designed and tested for this purpose. Uncertainties of input variables and ANN model were introduced to obtain the prediction for years 2022 to 2030. Pearson correlation was used to examine the input variables for model construction. The analysis indicates that Primary Energy Supply (PES), population, Gross Domestic Product (GDP) and temperature are strongly correlated. The forecast results by the proposed method (henceforth referred to as UQ-SNN) were compared with the results obtained by a conventional Seasonal Auto-Regressive Integrated Moving Average (SARIMA) model. The R^{2} scores for UQ-SNN and SARIMA are 0.9994 and 0.9787, respectively, indicating that UQ-SNN is more accurate in capturing the non-linearity and the underlying relationships between the input and output variables. The proposed method can be easily extended to include other input variables to increase the model complexity and is suitable for LTLF. With the available input data, UQ-SNN predicts Malaysia will consume 207.22 TWh of electricity, with standard deviation (SD) of 6.10 TWh by 2030.
Malaysia is projected to become a net energy importer by 2030 [1]. Traditional power generation mix lacks renewable energy sources to cover fast depletion of oil. Malaysia is picking up on solar energy to enhance the national power generation mix [2]. However, integration of increasingly large amount of solar power may pose a challenge to power system planning and operation, as different configurations can result in different requirements for system protection, management, and control to maintain the grid stability [3].
Good electricity demand forecasting is essential to operation and planning of power utilities, and is also vital for energy suppliers, policy makers, financial institutions, and other participants in electric energy generation, transmission, distribution, and markets [4]. Electricity demand forecasts can be split into three categories: short-term, mid-term, and long-term. Short-term load forecasts (STLF) are usually from one hour to one week, mid-term load forecasts (MTLF) are usually from a week to a year, and long-term load forecasts (LTLF) are longer than a year. LTLF is essential for electric power system planning as it affects the construction scheduling for purchasing new generating units, building new generation facilities, developing transmission and distribution systems [5].
Auto-Regressive Integrated Moving Average (ARIMA) and SARIMA models are frequently used techniques in electricity demand forecasting [6]. These conventional parametric regression forecasting techniques fail to ensure accurate results as they suffer several weaknesses, such as complexity of modelling and lack of flexibility [7] and do not consider the effects introduced by other variables such as economic and demographic factors. To overcome the weaknesses, forecasting methods based on Artificial Intelligence (A.I.) such as Fuzzy Logic, ANN, Expert Systems, Support Vector Machine, Analytic Hierarchy Process, and hybrid methods that combine parametric methods and A.I. have been proposed [8,9] Signal processing methods such as Empirical Mode Decomposition (EMD) [10] and Fast Ensemble-Decomposed Model (FED) [11] have also been developed to improve the prediction accuracy of LTLF. These methods though reportedly give more accurate predictions than the conventional ones, any long-term forecast is inaccurate by nature due to uncertain and uncontrollable factors that are directly and indirectly influencing the underlying forecasting process [12]. However, uncertainty quantification in LTLF has received little attention. Uncertainty quantification in LTLF can provide an important risk management reference for policymakers when making important decisions on power system planning [13].
This paper presents a flexible LTLF framework that combines SARIMA, Latin-Hypercube Sampling (LHS), and ANN to perform LTLF for Malaysia, considering propagation of model and input uncertainties. The framework is termed UQ-SNN, abbreviated from Uncertainty Quantified SARIMA Neural Network. The formulation of the UQ-SNN framework and the rationale behind are presented in the rest of the paper. The rest of the paper is organised as follows: The conventional SARIMA model for input variable forecasting is reviewed in Section 2. Then, the data used to construct the input variables for UQ-SNN are described and analyse in Section 3, followed by modelling the forecasting engine using ANN in Section 4. The UQ-SNN framework that combines the methods described in Sections 3 and 4 is presented in Section 5, alongside with the comparison of its performance with a conventional SARIMA model. Conclusions are presented in Section 6.
SARIMA Model
Based on the basic ARIMA model for time series regression, SARIMA model incorporates seasonality components to account for seasonal behaviors in the time series signals [14,15]. The model is generally being expressed in the form of SARIMA (p, d, q) × (P, D, Q)_{S}, where p, d, q and P, D, Q are the orders of Auto-Regression (AR), Integrated (I), and Moving Average (MA) trends for the non-seasonal and seasonal elements, respectively. Subscript S is the number of time steps for a single seasonal period. The AR part describes the correlations between the present and past values, non-stationary element in the time series data is processed by the integrated part, and the dependencies on errors of past values are accounted by the MA part. Mathematically, the model is described as follows [12–16]:
∅p(B)ΦP(BS)∇d∇SDxt=θq(B)ΘQ(BS)ϵt
where: xt is the forecast variable; ϕ(⋅),θ(⋅),andΦ(⋅),Θ(⋅)are the AR and MA polynomials of for non-seasonal and seasonal components, respectively; ∇dand∇SD are the differential operator for non-seasonal and seasonal components, respectively; B is known as the backshift operator, defined as Bk(xt)=xt−k and ϵtis the white noise.
In this study, the selection of hyperparameters (p, d, q, P, D, Q) for the SARIMA model was realised using the “forecast” library for R programming [16]. The value of S that yielded minimum mean squared error between the historical data and the predicted data was selected to construct the model. ACF (auto-correlation function) and PACF (partial ACF) were used to check the stationarity of the time series signals, while unit root tests were done by using Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests.
Data Analysis and Model Development
A total of 4 factors have been considered to construct the ANN model: Primary Energy Supply (PES) per capita, population, Gross Domestic Product (GDP) per capita, and climate. All these four factors are thought to have strong influence on electric consumption [5,17–19]. PES and GDP measure the scale of economic and conditions of a country, population size influences the growth on energy demand, and climate affects the use of energy to power air-conditioning units for comfort. Fig. 1 presents the Pearson correlation of those factors. The chart shows that annual mean rainfall is weakly correlated to all other factors involved, therefore it is excluded in this study.
Pearson correlation between variables of the dataset
The historical data of PES, GDP, population, and electricity demand form years 1980 to 2016 were taken from the Malaysia Energy Information Hub database (https://meih.st.gov.my). The data were split into training and validation sets by 7:3 ratio to construct SARIMA models. The models were then used to forecast their respected future values with 95% confidence intervals (CI), from 2017 to 2030. The historical data and the SARIMA results for GDP per capita, population, and PES per capita are as shown in Figs. 2–4, respectively.
Plot of GDP <italic>per capita</italic> at constant 2010 MYR value from years 1980 to 2030. The hyperparameters are (0, 1, 0) × (1, 1, 0, 11)Plot of population in Malaysia from years 1980 to 2030. The hyperparameters are (1, 1, 1) × (0, 1, 1, 7)Plot of PES <italic>per capita</italic> in Malaysia from years 1980 to 2030. The hyperparameters are (0, 1, 0) × (0, 1, 0, 21)
Climate is also a major contributor to energy consumption [17–19]. Only the bi-annual mean average temperature and rainfall data have been taken into consideration in this study. The monthly climate data from 1980 to 2015 used in this study were taken from the World Bank database (http://sdwebx.worldbank.org). As rainfall is weakly correlated to energy demand, only temperature data has been used to construct its SARIMA model. The model was then used to forecast quarterly temperature from 2016 to 2030, as shown in Fig. 5. The statistics of the model residuals presented in Fig. 6 confirmed that the SARIMA model is reliable. Presented in Fig. 7 is the historical and forecast trends of annual mean temperature and rainfall in Malaysia from 1980 to 2030.
Plot of quarterly mean temperature of Malaysia from years 1980 to 2030. The hyperparameters are (1, 1, 1) × (0, 1, 1, 24)Diagnostic plot quarterly mean temperature of MalaysiaPlot of annually averaged temperature of Malaysia from years 1980 to 2030. The forecast values are based on the SARIMA model presented in <xref ref-type="fig" rid="fig-5">Figs. 5</xref> and <xref ref-type="fig" rid="fig-6">6</xref>
The forecast values of each variable (see Figs. 2–5, and 7) are described in statistical sense at 95% prediction interval and the variable at each time-step is assumed to be normally distributed and independent. Note that the auto-correlation of each variable has already been dealt with in the SARIMA forecasting stage.
To simulate the possible electricity consumption scenarios from 2020 to 2030, the variables were resampled N_{d} times at each time-step from the joint probability distribution to construct the inputs for use in ANN model in later stage. The statistical properties of each variable (described in mean (µ) and standard error of mean (σx¯)) at each time-step of two-year intervals are presented in Tab. 1.
Outputs of SARIMA for use as ANN forecast input
VariableX
StatisticsN(μ,σx¯2)
Year
2022
2024
2026
2028
2030
PES/Cap.[toe/cap.]
μσx¯
3.47000.1784
3.61000.2060
3.85000.2304
3.99000.2524
3.97000.2726
Population[thousands]
μσx¯
27334208
27969313
28829421
29681537
30002606
GDP/Cap.[2010 MYR]
μσx¯
390531025
402441183
425971323
443571535
464031794
Temperature[∘C]
μσx¯
26.14620.3075
26.08200.3127
26.05190.3157
26.24830.3264
26.18410.3323
Artificial Neural Networks
Fig. 8 depicts the Multi-Layer Perceptron (MLP) ANN architecture for this study. It consists of an input layer of four input units, three hidden layers with five units each, and an output layer with one unit for electricity demand. All the units (i.e., neurons) are fully connected in a feed-forward fashion.
ANN architecture for UQ-SNNIllustration of perceptron model of neuron <inline-formula id="ieqn-19">
<!--<alternatives><inline-graphic xlink:href="ieqn-19.tif"/><tex-math id="tex-ieqn-19"><![CDATA[a_j^{\left( i \right)}]]></tex-math>--><mml:math id="mml-ieqn-19"><mml:msubsup><mml:mi>a</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math>
<!--</alternatives>--></inline-formula>
Each neuron is modelled as depicted in Fig. 9 known as perceptron. Mathematically, the process of jth neuron in layer ith releases signal y when reacts to input signal {x1,x2,⋅⋅⋅,xm} is as follows:
z=∑kmwkj(i)xk+bj(i)
y=Ψ(z)=max(0,z)
where, wkj(i) is the weight assigned to the kth input signal, b is a constant known as bias, and Ψ(⋅) is activation function. In the present study, Rectified Linear Unit (ReLU) activation function has been employed due to its ability to solve vanishing gradient problems and faster in computation [20]. Learning of input-output signals was realised using Back-propagation algorithm. Adaptive Moments (Adam) optimisation [21] has been used to minimise the loss function, ℓ (i.e., the objective function) by iteratively adjusting the weights during the learning phase:
ℓ=1n∑i=1n(Yi−Y^i)2+λ∑j=1mwj2
The first term of the loss equation is Mean Squared Error (MSE) of the model and targeted outputs for all n number of outputs. The second term of the loss equation is penalty function known as L2-regularization, which consists of a regularisation constant λ for all m number of weights. In conjunction with Back-propagation algorithm, L2-regularization helps to improve the model generalisation by penalising large weight values during the learning phase. In this study, λ=0.001 has been used. On top of that, early stopping is activated when ℓ stopped to improve for 100 successive epochs, such way to prevent overfitting and improve model generalisation.
The historical data of the input variables are split into 7:3 ratio by random for ANN training and testing, respectively. However, the historical data composed of annual data from 1980 to 2015 are not sufficient for ANN to learn the underlying relationships between the input and output variables. To overcome this, the annual data are interpolated to create another 12 data points in between each year, assuming each variable is linear in the respective years.
The modelling is realised with TensorFlow, a Google’s open-source modelling platform for artificial neural network and deep learning [22]. The performance of the ANN is presented in Fig. 10. Both D’Agostino K^{2} and Shapiro-Wilk tests confirm that the validation error (ϵm) is Gaussian. The R^{2} of the cross-validation plot of computed and validation datasets is 0.9994. A SARIMA model with hyperparameters (0, 1, 2) × (0, 1, 1, 3) has been constructed for validation purposes, with R^{2} score of 0.9787. This confirms that the proposed ANN can predict better than the conventional SARIMA method. The detail simulation results from both methods are tabulated in Tab. 2.
ANN model performance: (a) Validation error; (b) Cross-validation plotLong-Term Electricity Demand Forecasting
Fig. 11 illustrates the UQ-SNN model architecture. The uncertainty of each input variables is described with their respective statistical properties obtained with SARIMA modelling (see Tab. 1). The uncertainty induced by the ANN model, is treated as an input variable using the ϵmobtained in the ANN model validation stage. The final output Y of the model with uncertainty can be described as follows:
Y=f(X1,⋯,X4)+ϵm
where the bold font X and ϵm represent the N_{d} samples of input and model error with uncertainties. To determine N_{d}, sample convergence tests have been carried out on the sample µ and σx¯ of X and ϵm. About 10000 samples are required from the multi-dimensional joint probability distribution using Latin-Hypercube Sampling (LHS). N_{d} samples are drawn at each year of interest and fed into the ANN model to yield N_{d} size of forecast output. The results are presented in Fig. 12.
UQ-SNN model architectureLTLF using SARIMA and UQ-SNN for Malaysia
Tab. 2 presents the LTLF results obtained using SARIMA and UQ-SNN, alongside with the comparison of both methods in terms of percentage difference with respect to SARIMA results (%∆) and percentage of UQ-SNN outputs (%Y) that fall outside the SARIMA 95% CI. In general, the UQ-SNN predicts a slower electricity consumption growth than SARIMA. By year 2030, the electricity consumption in Malaysia projected by UQ-SNN is 207.22 TWh, about 8.48% lower than SARIMA prediction. When uncertainty is concerned, all the consumption predicted by UQ-SNN fall inside the SARIMA 95% CI. Although UQ-SNN produces lower consumption growth than SARIMA model, its predicted mean electricity consumption at each year of interest is still within the SARIMA’s 95% CI bounds. Therefore, the results obtained by the UQ-SNN are comparable with the SARIMA model.
Comparison of LTLF using SARIMA and UQ-SNN
ForecastHorizon[year]
UQ-SNN
SARIMA
%Δ with respectto SARIMA
%Y beyond SARIMA CI
Mean[TWh]
S.D.[TWh]
Mean[TWh]
Min. CI[TWh]
Max. CI[TWh]
2022
166.68
4.39
180.59
162.09
199.09
–7.70
14.7
2024
174.34
4.75
189.89
164.86
214.92
–8.19
2.20
2026
184.65
5.06
202.30
170.47
234.13
–8.73
0.22
2028
196.99
5.52
217.12
178.74
255.50
–9.27
0.05
2030
207.22
6.10
226.42
180.35
272.49
–8.48
0.00
Conclusions
LTLF is crucial for optimum operation and planning of electric power systems. A new LTLF approach called UQ-SNN has been developed and applied to forecast to electricity demands of Malaysia from 2022 to 2030. GDP per capita, PES per capita, population growth, and temperature have been used as inputs for LTLF of Malaysia. Pearson correlation has been used to study the importance of variables involve. Due to limited number of data is available, 12 data points for every year in historical data have been created through interpolation for each of the variables. SARIMA models have been constructed to model the input values with uncertainty of those variables in the forecast horizons.
An MLP ANN model with 3 hidden layers of 5 units each has been constructed for use as forecasting engine in the UQ-SNN framework. Validation error of the ANN using historical data is used to construct the model uncertainty and treated as an input variable. The variables described in uncertainty are then sampled 10000 times using LHS Monte-Carlo simulation to yield the electricity demands in statistical sense. The forecast results are then compared with SARIMA prediction for electricity demands in the forecast horizons. Considering that the mean values of the proposed ANN model are within 10% different than the SARIMA model, it is reasonable to conclude that the proposed method is comparable with the conventional SARIMA model.
The proposed UQ-SNN can capture input and model induced uncertainties, which is crucial in LTLF. Although only 4 variables have been used in this study, the proposed method is flexible and can be easily extended to include other variables to increase the model complexity and accuracy. By 2030, UQ-SNN predicts that Malaysia will consume 207.22 TWh of electricity with SD of 6.10 TWh.
Funding Statement: The project is funded by the Ministry of Higher Education Malaysia, under the Fundamental Research Grant Scheme (FRGS Grant No. FRGS/1/2016/TK07/SEGI/02/1).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
ReferencesAcademy of Sciences Malaysia (
2013).
Sustainable energy options for electric power generation in Peninsular Malaysia to 2030.
Perpustakaan Negara Malaysia,
ASM Advisory Report 1/2013.Cheong,
K. H., Tai,
V. C., Tan,
Y. C., Rahman,
N. F. Z., Chiong,
K. S.et al. (
2020).
An outlook on large-scale solar power production in Peninsular Malaysia for scenario year 2030.
Tai,
V. C., Uhlen,
K. (
2014).
Design and optimisation of offshore grids in baltic sea for scenario year 2030.
Carvallo,
J. P., Larsen,
P. H., Sanstad,
A. H., Goldman,
C. A. (
2018).
Long term load forecasting accuracy in electric utility integrated resource planning.
Soliman,
S. A., Al-Kandari,
A. M. (
2010).
Khatoon,
S., Ibraheem, Singh,
A. K., Priti (
2014). Analysis and comparison of various methods available for load forecasting: An overview.
Zhang,
X., Liu,
Y., Yang,
M., Zhang,
T., Young,
A. A.et al. (
2013).
Comparative study of four time series methods in forecasting typhoid fever incidence in china.
Çunkaş,
M., Altun,
A. A. (
2010).
Long term electricity demand forecasting in Turkey using artificial neural networks.
Stevanoski,
B., Mojsoska,
N. (
2017).
Using the analytic hierarchy process in long-term load growth forecast.
Ghelardoni,
L., Ghio,
A., Anguita,
D. (
2013).
Energy load forecasting using empirical mode decomposition and support vector regression.
Akrom,
N., Ismail,
Z. (
2018).
Electricity load demand forecast using fast ensemble-decomposed model.
Soliman,
S. A., Al-Kandari,
A. M. (
2010).
Tang,
L., Wang,
X., Wang,
X., Shao,
C., Liu,
S.et al. (
2019).
Long-term electricity consumption forecasting based on expert prediction and fuzzy bayesian theory.
Shumway,
R. H., Stoffer,
D. S. (
2017).
Brockwell,
P. J., Davis,
R. A. (
2016).
Hyndman,
R. J., Khandakar,
Y. (
2008).
Automatic time series forecasting: The forecast package for R.
De Felice,
M., Alessandri,
A., Catalano,
F. (
2015).
Seasonal climate forecasts for medium-term electricity demand forecasting.
De Felice,
M., Alessandri,
A., Ruti,
P. M. (
2013).
Electricity demand forecasting over Italy: Potential benefits using numerical weather prediction models.
Staffell,
I., Pfenninger,
S. (
2018).
The increasing impact of weather on electricity supply and demand.
Wang,
Y., Li,
Y., Song,
Y., Rong,
X. (
2020).
The influence of the activation function in a convolution neural network model of facial expression recognition.
Kingma,
D. P., Ba,
J. (
2014).
Adam: A method for stochastic optimization.
arXiv preprint arXiv: 1412.6980.Abadi,
M., Agarwal,
A., Barham,
P., Brevdo,
E., Chen,
Z.et al. (
2016).
Tensorflow: Large-scale machine learning on heterogeneous distributed systems.
arXiv preprint arXiv: 1603.04467.