Accurate prediction of water level in inland waterway has been an important issue for helping flood control and vessel navigation in a proactive manner. In this research, a deep learning approach called long short-term memory network combined with discrete wavelet transform (WA-LSTM) is proposed for daily water level prediction. The wavelet transform is applied to decompose time series into details and approximation components for a better understanding of temporal properties, and a novel LSTM network is used to learn generic water level features through layer-by-layer feature granulation with a greedy layer wise unsupervised learning algorithm. Six representative reaches in Yangtze River, namely, the Jianli, Wuhan, Jiujiang, Anqing, Wuhu, and Nanjing are investigated, and water level data from 2010 to 2019 are processed through temporal and spatial correlation analysis, and combination-optimized to develop and evaluate the proposed model. In general, the test average performances on RMSE and MAE are less than 0.045 m and 0.035 m respectively, which outperforms the state-of-the-art models, such as WA-ANN, WA-ARIMA and LSTM models. The results indicate that the WA-LSTM model is stable, reliable and widely applicable.

The Yangtze River, being the longest river in China and the third longest river in the world, runs across China from west to east, plays a vital role in the economic development. Water level prediction is not only helpful for flood control in flood season and vessel navigation in dry season, but also conducive for waterway regulation, port management, etc. Thus, accurately and timely prediction is particularly necessary.

In recent decades, a wide variety of approaches have been investigated for water level prediction, mainly divided into model-driven and data-driven methods. The model-driven methods include experience formulas between water level and water flow [

The data-driven models, in the early periods, representative of shallow neural networks, including support vector regression [

Neural networks can be specified an arbitrary number of input features, providing support for multivariate prediction, many studies have concentrated on input features selection and extraction, such as stochastic continuum temporal combinations of water level in previous time steps [

Despite the huge improvements in water level prediction achieved by the above methods, the shallow neural networks that do not have memory, which fail to capture the long-term evolution and can only learn a mapping between input and output patterns, thus incapability to extract the overall temporal interaction of multiple inputs. Recently, deep neural networks called deep learning, has dramatically brought about breakthroughs to the shallow neural networks [

In this paper, we propose a deep-learning-based prediction model. Herein, a novel LSTM network is used to learn generic water level features through layer-by-layer feature granulation with a greedy layer wise unsupervised learning algorithm, and the discrete wavelet transform is applied to help to extract the features preliminarily for performance improvement. The remainder of this paper is organized as follows. Section 2 introduces the prediction methodology. Section 3 introduces the study area. Section 4 proposes the prediction model. Section 5 shows the experimental results and some discussions. The conclusions are drawn in Section 6.

LSTM is a special kind of RNN, unlike the repeating module in hidden layer has a very simple structure, such as a single tanh layer in standard RNN, it is known as memory blocks in LSTM, each memory block contains one or more self-connected memory cells and three multiplicative units: input gate, output gate and forget gate. The input gate can allow incoming signal to alter the state of the memory cell or block it, the output gate allows the state of the memory cell to have an effect on other neurons or prevent it, the forget gate decides when to forget the output results and thus selects the optimal time lag for the input sequence, this special structure has the ability of bridging very long time lags.

The discrete wavelet transform (DWT) has recently become a very popular when it comes to analysis and denoising time series [

where

The coefficients

The reference decomposition level

where

In this study, the discrete wavelet transform is combined with the LSTM network for water level prediction in one day ahead, the combination model WA-LSTM is shown in

The feature selection to determine

The feature decomposition using wavelet function to transform each input feature

The decomposed feature learning and prediction using LSTM network separately, the predicted values are

The feature reconstruction using wavelet function to get the predicted water level

In shallow neural networks, the most widely used training algorithm is error back-propagation, while it has been proven too difficult to train deep neural networks, empirically no better and often worse, a reasonable explanation is that gradient-based optimization starting from random initialization may get stuck near poor solutions. Recently, Hinton [

Design the architecture of the networks, and initialize parameters including weight matrices and bias vectors randomly.

Pre-training the first layer at a time in a greedy way, using unsupervised learning from bottom layer to top layer in order to preserve feature information from the input;

Fine-tuning the whole network by using back propagation method with gradient-based optimization from top layer to bottom layer in a supervised way, for searching optimal parameters by minimizing the cost function defined as:

where

In this study, six routine surveillance reaches in the middle and lower of Yangtze River, including Jianli, Wuhan, Jiujiang, Anqing, Wuhu and Nanjing are considered as the case study areas (

Water level in Yangtze River changes daily and shows periodically trends. For instance, the Jianli reaches, in

Year | Maximum daily difference (m) | |||||
---|---|---|---|---|---|---|

Jianli | Wuhan | Jiujiang | Anqing | Wuhu | Nanjing | |

2010 | 0.65 | 0.48 | 0.54 | |||

2011 | 1.04 | 0.81 | 0.62 | 0.68 | 0.59 | 0.81 |

2012 | 0.79 | 0.74 | 1.21 | 1.32 | 0.75 | |

2013 | 0.80 | 0.59 | 0.92 | 0.34 | 0.38 | 0.83 |

2014 | 0.99 | 0.80 | 0.85 | 1.00 | ||

2015 | 1.08 | 1.31 | 1.20 | 1.08 | 0.72 | 0.96 |

2016 | 0.80 | 1.31 | 0.52 | 0.75 | 1.17 | 0.89 |

Water level differ from the upstream to the downstream.

In neural network models, one of the most important issues for model training is to determine the input features, in order to provide the best available input pattern for LSTM network, the correlation coefficients are calculated based on the coefficient of determination

where

Time step | Temporal correlation coefficients | |||||
---|---|---|---|---|---|---|

Jianli | Wuhan | Jiujiang | Anqing | Wuhu | Nanjing | |

0.997 | 0.997 | 0.998 | 0.997 | 0.993 | 0.984 | |

0.989 | 0.992 | 0.993 | 0.992 | 0.986 | 0.963 | |

0.978 | 0.984 | 0.985 | 0.984 | 0.977 | 0.934 | |

0.966 | 0.973 | 0.974 | 0.973 | 0.966 | 0.912 | |

0.952 | 0.960 | 0.962 | 0.961 | 0.953 | 0.891 | |

0.938 | 0.947 | 0.948 | 0.948 | 0.939 | 0.874 |

1.000 | 0.839 | 0.877 | –0.112 | –2.498 | ||

0.784 | 1.000 | 0.805 | –1.556 | –5.890 | ||

0.866 | 0.932 | 1.000 | –0.471 | –3.552 | ||

0.886 | 0.866 | 1.000 | –0.021 | –2.473 | ||

0.505 | 0.154 | 0.450 | 0.509 | 1.000 | ||

0.099 | –0.319 | 0.016 | 0.033 | 1.000 |

Considering the correlation analysis in

Prediction function | Prediction Steps | |
---|---|---|

Temporal correlation | ||

Spatial-temporal correlation |

LSTM network is sensitive to the scale of input data, specifically when the tanh and relu activation functions are used. In addition, from

The discrete wavelet transform process mainly contains two aspects, one is to select an appropriate wavelet function as mother wavelet, the widely used are haar, db2, meyer, sym1, bior1.1, rboi1.1 and coif1 wavelets. The other critical point is to determine the decomposition level, according to the

In order to evaluate the performances of the proposed model for water level prediction, two widely used criteria are applied to measure the error of the predicted data, they are the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE). The mathematical equations are defined as:

where

In order to train the WA-LSTM model parameters and prove its predictive ability, the dataset is split into two parts, the first 70% dataset is used as the training sample, while the remaining 30% is employed as testing sample for measuring prediction performance of the proposed networks.

The effectiveness of deep learning highly depends on the LSTM network topology, before applying WA-LSTM to the dataset, some appropriate hyper-parameters must be fitted. As shown in

Parameters | Values | ||||
---|---|---|---|---|---|

Hidden layer | 3 | 3 | 2 | 2 | 2 |

Hidden layer nerves | 20 | 20 | 30 | 30 | 30 |

Batch size | 15 | 15 | 15 | 15 | 15 |

Epochs | 1500 | 1000 | 500 | 300 | 300 |

According to

In terms of the prediction precision, the lowest RMSE and MAE of the Jianli, Wuhan, Jiujiang, Anqing, Wuhu, Nanjing reaches are 0.035 and 0.028, 0.043 and 0.034, 0.028 and 0.019, 0.030 and 0.023, 0.038 and 0.030, 0.036 and 0.025, respectively. Such results are pretty impressive when looking into

Prediction | Predicted error (m) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Jianli | Wuhan | Jiujiang | Anqing | Wuhu | Nanjing | |||||||

RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | |

0.162 | 0.125 | 0.188 | 0.139 | 0.161 | 0.121 | 0.151 | 0.110 | 0.096 | 0.095 | 0.172 | 0.126 | |

0.046 | 0.035 | 0.061 | 0.041 | 0.048 | 0.039 | 0.043 | 0.035 | 0.045 | 0.032 | 0.055 | 0.043 | |

0.041 | 0.031 | 0.058 | 0.048 | 0.039 | 0.033 | 0.042 | 0.031 | 0.044 | 0.030 | 0.051 | 0.038 | |

0.039 | 0.030 | 0.049 | 0.032 | 0.036 | 0.030 | 0.035 | 0.027 | 0.050 | 0.034 | 0.039 | 0.026 | |

0.037 | 0.029 | 0.048 | 0.034 | 0.044 | 0.031 | 0.043 | 0.031 | |||||

0.030 | 0.023 | 0.038 | 0.029 | |||||||||

0.043 | 0.030 | 0.059 | 0.048 | 0.033 | 0.027 | 0.033 | 0.026 | 0.043 | 0.030 | 0.041 | 0.028 | |

0.227 | 0.170 | 0.197 | 0.150 | 0.148 | 0.103 | 0.129 | 0.089 | 0.132 | 0.097 | 0.162 | 0.120 | |

0.061 | 0.045 | 0.064 | 0.048 | 0.051 | 0.042 | 0.051 | 0.041 | 0.044 | 0.030 | 0.054 | 0.042 | |

0.054 | 0.043 | 0.058 | 0.042 | 0.042 | 0.033 | 0.045 | 0.037 | 0.043 | 0.032 | 0.053 | 0.041 | |

0.053 | 0.042 | 0.079 | 0.063 | 0.065 | 0.045 | 0.040 | 0.028 | 0.047 | 0.033 | 0.044 | 0.033 |

According to

Function | Predicted error (m) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Jianli | Wuhan | Jiujiang | Anqing | Wuhu | Nanjing | |||||||

RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | |

* | 0.108 | 0.067 | 0.095 | 0.061 | 0.057 | 0.039 | 0.071 | 0.045 | 0.102 | 0.061 | 0.094 | 0.076 |

meyer | ||||||||||||

db2 | 0.084 | 0.060 | 0.093 | 0.059 | 0.051 | 0.035 | 0.058 | 0.043 | 0.092 | 0.051 | 0.092 | 0.067 |

haar | 0.105 | 0.062 | 0.094 | 0.068 | 0.055 | 0.042 | 0.064 | 0.049 | 0.094 | 0.059 | 0.082 | 0.056 |

coif1 | 0.074 | 0.051 | 0.088 | 0.048 | 0.050 | 0.033 | 0.056 | 0.036 | 0.105 | 0.052 | 0.104 | 0.075 |

sym2 | 0.083 | 0.059 | 0.092 | 0.056 | 0.044 | 0.032 | 0.056 | 0.041 | 0.102 | 0.051 | 0.090 | 0.064 |

bior1.1 | 0.102 | 0.069 | 0.105 | 0.067 | 0.058 | 0.041 | 0.068 | 0.043 | 0.105 | 0.064 | 0.096 | 0.068 |

rbio1.1 | 0.099 | 0.065 | 0.101 | 0.058 | 0.057 | 0.040 | 0.070 | 0.044 | 0.098 | 0.052 | 0.097 | 0.073 |

* indicates no wavelet transform.

Meyer | Predicted error (m) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Jianli | Wuhan | Jiujiang | Anqing | Wuhu | Nanjing | |||||||

RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | |

1 | 0.108 | 0.081 | 0.091 | 0.075 | 0.058 | 0.045 | 0.075 | 0.055 | 0.081 | 0.063 | 0.096 | 0.070 |

2 | 0.068 | 0.052 | 0.068 | 0.050 | 0.053 | 0.041 | 0.062 | 0.049 | 0.069 | 0.044 | 0.076 | 0.060 |

3 | 0.061 | 0.047 | 0.072 | 0.051 | 0.039 | 0.029 | 0.038 | 0.058 | 0.042 | 0.031 | 0.056 | 0.043 |

4 | ||||||||||||

5 | 0.102 | 0.064 | 0.065 | 0.043 | 0.045 | 0.035 | 0.050 | 0.038 | 0.047 | 0.033 | 0.056 | 0.044 |

In order to confirm the effectiveness and generalization of the WA-LSTM model, comparison experiments are carried out using the state-of-the-art prediction models, such as ANN, LSTM and ARIMA, these models are also combined with Meyer wavelet transform at level 4, their prediction structures are the same in

Reaches | Predicted errors (m) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

WA-LSTM | LSTM | WA-ANN | WA-ARIMA | |||||||||

RMSE | MAE | Steps | RMSE | MAE | Steps | RMSE | MAE | Steps | RMSE | MAE | Steps | |

Jianli | 0.046 | 0.031 | 0.051 | 0.039 | 0.066 | 0.048 | ||||||

Wuhan | 0.059 | 0.044 | 0.065 | 0.056 | 0.075 | 0.059 | ||||||

Jiujiang | 0.033 | 0.028 | 0.039 | 0.028 | 0.045 | 0.034 | ||||||

Anqing | 0.039 | 0.026 | 0.045 | 0.032 | 0.049 | 0.037 | ||||||

Wuhu | 0.058 | 0.048 | 0.061 | 0.045 | 0.065 | 0.050 | ||||||

Nanjing | 0.054 | 0.038 | 0.060 | 0.042 | 0.061 | 0.047 |

In this research, a new WA-LSTM model based on discrete wavelet transform and long short-term memory network for water level prediction in Yangtze River is proposed to help flood control and vessel navigation. In the provided model, water level time series are firstly decomposed into high frequency and low frequency components using wavelet transforms with different scales for a better understanding of temporal properties, then each component is put into the LSTM network for independent prediction, finally, the predicted values are reconstructed to get the predicted water level in one day ahead.

In order to confirm the effectiveness and generalization of the model, six representative reaches including Jianli, Wuhan, Jiujiang, Anqing, Wuhu and Nanjing are applied to study, and several comparisons are developed, including the practicable of temporal and spatial combination, the sensitivity of mother wavelet types and decomposition levels, the efficiency of the state-of-the-art models contained LSTM, WA-ANN and WA-ARIMA. Comprehensive research finds out that 5 or 6 days lag observations as input features using Meyer wavelet transform with decomposition level 4 provides the best performance, which less than 0.045 m on RMSE and less than 0.035 m on MAE in general, extraordinary has only 0.028 m on RMSE and 0.019 m on MAE at Jiujiang reaches. The results are superior to those of competing models, and demonstrates that the WA-LSTM model has strong applicability and generalization, provides references to further research on water level prediction in Yangtze River.

Future research would look into more comprehensive prediction that incorporates with the temporal characteristics like dry season and flood season, or the weather forecasting such as rainstorm, or the waterway tributary characteristics. Furthermore, it would be interesting to investigate other deep learning models for water level prediction.

The authors are very thankful to the Changjiang Maritime Safety Administration for the availability of the data resources.