To accurately predict traffic flow on the highways, this paper proposes a Convolutional Neural Network-Bi-directional Long Short-Term Memory-Attention Mechanism (CNN-BiLSTM-Attention) traffic flow prediction model based on Kalman-filtered data processing. Firstly, the original fluctuating data is processed by Kalman filtering, which can reduce the instability of short-term traffic flow prediction due to unexpected accidents. Then the local spatial features of the traffic data during different periods are extracted, dimensionality is reduced through a one-dimensional CNN, and the BiLSTM network is used to analyze the time series information. Finally, the Attention Mechanism assigns feature weights and performs Softmax regression. The experimental results show that the data processed by Kalman filter is more accurate in predicting the results on the CNN-BiLSTM-Attention model. Compared with the CNN-BiLSTM model, the Root Mean Square Error (RMSE) of the Kal-CNN-BiLSTM-Attention model is reduced by 17.58 and Mean Absolute Error (MAE) by 12.38, and the accuracy of the improved model is almost free from non-working days. To further verify the model’s applicability, the experiments were re-run using two other sets of fluctuating data, and the experimental results again demonstrated the stability of the model. Therefore, the Kal-CNN-BiLSTM-Attention traffic flow prediction model proposed in this paper is more applicable to a broader range of data and has higher accuracy.

Recently, with the rise of people’s demand for cars and the development of new energy vehicles, the number of private cars has increased rapidly. Until September 2022, the number of cars in 72 cities nationwide reached 315 million, and the vast number of cars has brought enormous pressure on urban road traffic. The service level of urban roads is an essential reflection of the country’s economic development. As an essential part of urban road traffic, the operation efficiency of highways affects the country’s economic level and people’s quality of life. Therefore, the prediction of traffic flow by analyzing the data obtained from monitors on highways is the research direction of many scholars today. A large number of methods have been cited, such as the early traditional linear forecasting method, which is relatively simple in its operation steps but cannot reflect the traffic flow state accurately because it fluctuates from moment to moment [

Traffic flow can be divided into long-term, medium-term, and short-term predictions according to the length of the forecast time. Long-term prediction mainly refers to the traffic flow prediction for the selected areas for one year or even for the next few years; medium-term prediction mainly predicts the traffic flow for three time periods: weekly, daily, and hourly; while short-term prediction mainly predicts the real-time traffic flow for the specified road section for the next fifteen minutes [

Short-term traffic flow prediction is affected by unexpected accidents, which has high requirements on the prediction model, and how to handle the distinctive values becomes the key to prediction [

Convolutional Neural Networks (CNN): Responsible for extracting spatial feature vectors and dimensionality reduction of data [

Bi-directional Long Short-Term Memory (BiLSTM): It comprehensively analyzes sequence information by forward LSTM and backward LSTM, then extracts temporal features.

Attention Mechanism: Different weight coefficients are assigned to different features, then these weight coefficients are weighted for the output results [

The remaining sections of this paper are organized as follows: in Section 2, the research of some scholars in the field of intelligent traffic flow prediction in recent years is reviewed; in Section 3, the preprocessing of experimental data is introduced; in Section 4, the primary model used in this experiment is introduced; in Section 5, the training process of the model and the experimental results obtained are presented; in Section 6, the experimental conclusion is given.

After entering the 21st century, computer science and technology have developed rapidly. Artificial intelligence has been applied to all aspects of life, and more intelligent traffic prediction methods have emerged, for example, various prediction methods based on Machine learning and Deep learning have been cited by many scholars [

Feng et al. [

In the meantime, Deep learning techniques have been developed in traffic flow prediction. Fukuda et al. [

The above literature reviews the applications of Machine learning and Deep learning in traffic flow prediction in recent years [

Because the highway data involves national machinery and security, it is not appropriate to use the data from the last three years. The PEMS04 data set used in this experiment is from the California highway network, which is obtained from the California highway traffic flow data obtained from 307 detectors for 58 consecutive days starting from January 1, 2018, and collected every 5 min. All the traffic flow data obtained are plotted into a waveform graph. As shown in

The traffic volumes for each hour of the day are plotted as box plots in

The three main feature parameters of this experimental data are flow, occupancy, and speed [

The Kalman filter is widely used in various fields, such as communication systems, aerospace, industrial control, etc. The core idea of the Kalman filter is state estimation, which helps to understand and control the system. It can estimate and predict the operation state of the system in real-time, so the Kalman filter has not only the function of filtering but also the function of prediction. However, the general Kalman filter model is less accurate in short-time traffic flow prediction, so this experiment only uses the Kalman filter to process the data [

Due to the interference of external factors, the traffic flow data was found to be too volatile when plotted, so the traffic flow data was subjected to Kalman filtering. Data filtering is generally the process of removing the noise generated by external disturbances to restore the authenticity of the original data. Kalman filtering is inputting the data into the time update equation and state update equation, updating the state quantities in real time, and finally giving an optimal estimate of the results. The Kalman filter time update equation is

Its state update equation is

The fluctuation of the data after the Kalman filtering process becomes significantly smaller, as shown in

The filtered data will be normalized, and this experiment uses the maximum-minimum normalization method commonly used for data normalization to facilitate the model input later. It can be defined as

Finally, the training set and the test set are divided, the data from 51 consecutive days from January 1 to February 20 are taken as the training set. The data of the last seven days from February 21 to February 27 are taken as the test set.

When conducting traffic flow prediction experiments, three basic feature parameters cannot be ignored, they are flow, occupancy, and speed, and the essence of traffic flow prediction is to predict the traffic flow in the following period by these three parameters because the short-term traffic flow prediction is more dependent on the feature parameters. This experiment adds five new parameters to the primary feature parameters, which are hour, day of the week, month, first-order difference, and second-order difference, so there are eight feature parameters in this experiment.

Through research and experiments, it is found that the prediction results are better when the convolutional neural network layer is used as the input layer of the model. Generally, CNN consists of an input layer, a convolutional layer, a pooling layer, and a fully connected layer. After the data is input by the input layer, the convolutional layer extracts the data feature information, and the pooling layer selects these features to reduce the number of variables in the feature map, thus reducing the computation and improving the computation efficiency. From the pooling layer to the fully connected layer, the data is mapped from more to less, and dimensionality reduction is performed to facilitate the processing of the data [

In this paper, a one-dimensional CNN is used to enable feature extraction of time series. Known sequence:

CNN extracts the features of a time series, which is to find a sequence of length

Unlike the convolutional neural networks that deal with local spatial feature information, recurrent neural networks (RNN) are used to analyze time series information. Due to the limited memory of traditional RNN, they cannot solve the long-term dependence problem, if this time series is too long, the influence of the information source from the beginning to the end of the network structure will gradually weaken, which gives rise to the gradient disappearance problem. The long short-term memory network model was created to solve this problem. Compared with the traditional RNN model, the LSTM model is much more complex in structure, mainly by adding three controllers: input gate, forgetting gate, and output gate. The input gate obtains external information; the forgetting gate decides whether to forget the information in the neuron cells selectively; the output gate is responsible for outputting the current state information. The unique “gate” structure is an excellent solution to the problem of gradient disappearance and gradient explosion in the traditional RNN model when dealing with long sequences. Its memory is better [

Both RNN and LSTM predict the output of the next moment from the previous information, which cannot capture the dependence of bidirectional information, resulting in the inability to reasonably predict the state value of the current moment in dealing with some specific problems. The bi-directional LSTM can take into account both forward and backward information. For each moment, the system will provide the input to two LSTMs with the same structure but opposite directions, which accurately determine the output of the current moment by combining the before and after information, and thus are more suitable for handling long sequence tasks with bidirectional dependencies. The bi-directional LSTM is composed of two unidirectional LSTMs with identical algorithms, and its neural network structure is shown in

The forward propagation layer and the backward propagation layer each make a nonlinear change of the input data to obtain the hidden output layer, and the two independent hidden layers are then combined and output to the same output layer through a layer of connection. The formula is expressed as

In general, Deep learning will ignore the weight of different information when extracting data features, because the impact of different information on the output result is different, so ignoring the difference between each piece of information will cause the loss of important information, which shows the importance of paying attention to the critical areas in the process of information processing. For example, when we read an article, some important words can help us understand the article more quickly and accurately, which is the practical meaning of the Attention Mechanism proposed by researchers. The basic idea of the Attention Mechanism is to identify the importance of different information and give more attention to the information with a relatively large weight, which is the distribution of the weight factor in essence [

The CNN-BiLSTM-Attention model mainly consists of an input layer, a CNN layer, a BiLSTM layer, a random loss layer, an Attention layer, two fully connected layers, and an output layer. The CNN layer and BiLSTM layer are followed by the random loss layer, which can effectively avoid the overfitting phenomenon. The specific training process is as follows: firstly, the pre-processed traffic flow data is input into the CNN layer for spatial feature extraction, and the spatial correlation features of each time step are obtained by sliding a one-dimensional convolutional kernel filter to obtain multiple sets of feature vectors. Then the feature vectors are input into the BiLSTM layer, and the data are trained in both directions using BiLSTM to extract the sequence time feature information fully. Finally, the weight vector is introduced using the Attention Mechanism, and the weight of each feature is calculated by normalizing the Softmax function. The local information of each time step is weighted and summed to obtain the global features and calculate the attention coefficients of each time node, which can describe the relevance of time nodes to traffic flow prediction. The attention coefficients are weighted to obtain the final prediction results. The calculation process is given by

In this paper, a CNN-BiLSTM network prediction model based on an Attention Mechanism that can learn features in both spatial and temporal dimensions simultaneously is constructed. The internal structure of this model is shown in

The hardware host for the experiments uses AMD Ryzen5 4600H equipped with NVIDIA GTX2060ti and 16G RAM; the software uses pytorch3.7.8 and the environment of keras2.3.1 and tensorflow1.15.2 [

The total number of samples in this experiment is 16992, including 2016 prediction samples, and the ratio of the training set to test set is 7:1. In most of the articles, only the traffic flow of weekdays is predicted, and the traffic flow prediction of non-working days is omitted. By observing the original data, it can be obtained that the traffic flow on non-working days is different from the traffic flow on working days due to the interference of various external factors. In contrast, the traffic flow of a whole week is predicted in this paper, and the accuracy of the model prediction is not reduced, and a better result is achieved. In the test set of traffic flow prediction, two main evaluation indexes are set: Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), which is calculated by

The model’s input is the historical traffic data and eight essential parameters, such as flow, occupancy, speed, etc. The output is the traffic forecast data of the last seven days, and then the model’s accuracy is measured by comparing the traffic data of the last seven days in the dataset.

When using PyTorch to reproduce the results, due to the random nature of the algorithm, the same data and code run twice sometimes have very different results, so it is necessary to set a random seed, which can reduce the variability of the model reproduction. This experiment uses the Adam optimizer for parameter optimization. The Adam optimizer can update the variables using the historical information of the gradient with a learning rate of 0.001 and 300 iterations, and its main parameters are set in

Parameter name | Quantity |
---|---|

Number of characteristics | 8 |

Time step | 21 |

Filter length | 5 |

Number of filters | 18 |

Pooling length | 4 |

Input number set | 21 × 8 |

Output number set | 1 |

To check the model’s accuracy, this experiment compared the prediction results of the benchmark models BiLSTM and CNN-BiLSTM with the prediction results after adding Kalman filtering. It was found that the training results of the data without Kalman filtering were poor in both models, but the prediction accuracy of both models was significantly improved after the data were processed by Kalman filtering, and the specific results are shown in

The experimental results show that the experimental results of both the BiLSTM model and the CNN-BiLSTM model with the addition of Kalman filtering fit better. By comparing the loss functions RMSE and MAE of the four models, it was found that in the seven-day prediction set, the RMSE and MAE values of both the BiLSTM-Kal model and the CNN-BiLSTM-Kal model were significantly smaller than BiLSTM and CNN-BiLSTM models. The resultant data are tallied in

Date | BiLSTM | BiLSTM-KAL | CNN-BiLSTM | CNN-BiLSTM-KAL | ||||
---|---|---|---|---|---|---|---|---|

RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | |

2.21 | 39.76 | 29.78 | 26.77 | 18.16 | 39.98 | 29.21 | 24.66 | 16.75 |

2.22 | 39.28 | 29.64 | 22.91 | 16.77 | 37.99 | 28.49 | 22.65 | 15.67 |

2.23 | 32.54 | 24.14 | 18.52 | 13.44 | 30.48 | 21.73 | 17.69 | 12.60 |

2.24 | 31.31 | 22.95 | 16.91 | 12.06 | 29.65 | 20.93 | 16.18 | 11.31 |

2.25 | 38.00 | 27.32 | 22.10 | 15.89 | 38.14 | 26.24 | 19.40 | 13.65 |

2.26 | 37.27 | 26.81 | 21.41 | 15.59 | 36.62 | 25.99 | 20.06 | 14.43 |

2.27 | 35.10 | 25.94 | 20.65 | 14.70 | 33.47 | 23.47 | 18.67 | 13.26 |

Average | 36.18 | 26.65 | 21.32 | 15.23 | 35.19 | 25.15 | 19.90 | 13.95 |

To continue to improve the experimental results, this experiment added the Attention Mechanism to the CNN-BiLSTM-Kal model to improve the experimental accuracy by assigning feature weights. The above experimental results are for the traffic flow prediction of seven days a week, because the traffic flow on weekends is complex and unstable, considering that adding the weekend traffic prediction work may reduce the accuracy of the model, so in most of the experimental results given in the article, the weekend traffic prediction work is omitted, but many families will choose to travel and play on weekends. To consider the safety of travel, we need to know the traffic flow at each time point on weekends. Therefore, this experiment compared the prediction results of weekday traffic only with the prediction results of the whole week (February 24th and 25th are non-working days). The results obtained are shown in

The experimental results show that the prediction results fit better after adding the Attention Mechanism, and the RMSE of the Kal-CNN-BiLSTM-Attention model shrinks by 2.29 and the MAE shrinks by 1.18 compared to the CNN-BiLSTM-Kal model in the prediction results of whole weekday traffic, as shown in

Date | Whole week | Workday | ||
---|---|---|---|---|

RMSE | MAE | RMSE | MAE | |

2.21 | 22.29 | 15.03 | 20.32 | 14.45 |

2.22 | 19.33 | 13.82 | 17.13 | 13.08 |

2.23 | 16.33 | 11.77 | 14.56 | 11.27 |

2.24 | 13.75 | 10.69 | ||

2.25 | 17.19 | 12.55 | ||

2.26 | 18.81 | 13.42 | 17.01 | 12.83 |

2.27 | 15.58 | 12.12 | 16.12 | 12.26 |

Average | 17.61 | 12.77 | 17.03 | 12.78 |

ARIMA: It is a well-known time series analysis method for predicting future values.

SVR: The mapping relationship between input and output is obtained by training, and then the prediction is performed.

Gate Recurrent Unit (GRU): It uses gate recurrent unit training data for short-term traffic flow forecasting.

Temporal Graph Convolutional Network (T-GCN): This model captures both spatial and temporal dependencies by using a combination of GCN and GRU, and finally performs short-term traffic flow forecasting.

The experimental results show that the prediction results of the CNN-BiLSTM-Attention model based on Kalman-filtered data processing are better than other models.

Model | RMSE | MAE |
---|---|---|

ARIMA | 47.39 | 35.36 |

SVR | 38.11 | 28.43 |

GRU | 35.88 | 26.58 |

T-GCN | 29.22 | 20.04 |

Kal-CNN-BiLSTM-ATT | 17.61 | 12.77 |

To verify the model’s applicability, the experiment was re-run by replacing two sets of traffic flow data, which were obtained from different detectors. The experimental results are shown in

Date | RMSE | MAE |
---|---|---|

Fifth detector | 15.50 | 11.31 |

Tenth detector | 15.74 | 11.85 |

This experiment first used the Kalman filter to smooth out the original fluctuating data, which reduces the universality of data due to external interference and applies to a broader range of data. Then the one-dimensional CNN was combined with BiLSTM to comprehensively analyze the three features and sequence information of flow, occupancy, and speed. Finally, the Attention Mechanism was added to the CNN-BiLSTM benchmark model to reduce the loss function by assigning feature weights, thus improving the experimental results. Compared with the CNN-BiLSTM model alone, the Kal-CNN-BiLSTM-Attention model has reduced RMSE by 17.58 and MAE by 12.38.

In this paper, multiple traffic flow prediction methods are effectively combined. By using Kalman filter to process the data, it not only improves the prediction accuracy, but also makes the model applicable to a broader range of data. This model can be applied to traffic flow forecasting in a variety of conditions, including weekends and holidays. At the same time, there are some shortcomings in this experiment, such as not considering external factors, such as rain, snow, traffic accidents, and highway rerouting; these factors can cause deviations in the simulation results of the prediction model. When making traffic flow prediction, not only the local geographical correlation but also the global regional relationship needs to be considered. Multi-scale self-attentive networks can be used to obtain the instantaneous dynamics at different time resolutions in future research. All of these are the focus of future research in traffic flow prediction.

This research is supported by the Supported by Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region (No. NJYT23060).

The authors declare that they have no conflicts of interest to report regarding the present study.