Traffic flow prediction in urban areas is essential in the Intelligent Transportation System (ITS). Short Term Traffic Flow (STTF) prediction impacts traffic flow series, where an estimation of the number of vehicles will appear during the next instance of time per hour. Precise STTF is critical in Intelligent Transportation System. Various extinct systems aim for short-term traffic forecasts, ensuring a good precision outcome which was a significant task over the past few years. The main objective of this paper is to propose a new model to predict STTF for every hour of a day. In this paper, we have proposed a novel hybrid algorithm utilizing Principal Component Analysis (PCA), Stacked Auto-Encoder (SAE), Long Short Term Memory (LSTM), and K-Nearest Neighbors (KNN) named PALKNN. Firstly, PCA removes unwanted information from the dataset and selects essential features. Secondly, SAE is used to reduce the dimension of input data using one-hot encoding so the model can be trained with better speed. Thirdly, LSTM takes the input from SAE, where the data is sorted in ascending order based on the important features and generates the derived value. Finally, KNN Regressor takes information from LSTM to predict traffic flow. The forecasting performance of the PALKNN model is investigated with Open Road Traffic Statistics dataset, Great Britain, UK. This paper enhanced the traffic flow prediction for every hour of a day with a minimal error value. An extensive experimental analysis was performed on the benchmark dataset. The evaluated results indicate the significant improvement of the proposed PALKNN model over the recent approaches such as KNN, SARIMA, Logistic Regression, RNN, and LSTM in terms of root mean square error (RMSE) of 2.07%, mean square error (MSE) of 4.1%, and mean absolute error (MAE) of 2.04%.

STTF forecasting is based on actual and past transit information recorded from sensing devices, namely cameras, radar, and inductors. The extensive use of traffic sensors and the modernization of extending traffic sensing devices largely boost the traffic flow data. In extensive data transit, data-driven mobility management and monitoring became increasingly widespread. Accurate and precise traffic flow data reduces traffic congestion, enhances traffic operation efficiency, and optimizes travel decisions. Effective and accurate traffic flow forecast is critical research in Intelligent Transportation Systems. It not only assists traffic managers in anticipating traffic operations and avoiding potential traffic congestion but also offers more traffic information, allowing travelers to plan their trips before time and adjust their routes [

The general augmentation concerning this article seems to be hereunder.

We introduced PCA to remove unwanted information from the dataset and select essential features.

We introduced SAE to reduce the dimension of input data. So the input data has high visibility to the model for faster execution with less time because of one hot representation.

LSTM model will derive a value by considering 100 initial one-hot values as input data.

All the Derived values of LSTM will be given to KNN to predict the final traffic flow using the mean of K Nearest Neighbors (where k = 5) with the help of Euclidean Distance optimizer.

We calculated overall work using PALKNN through statistics. We regulated PALKNN at the actual dataset and tested the outcomes demonstrating the effectiveness of PALKNN, which seems to be preferable to other forecasting approaches.

The first research regarding STTF prediction was conducted during the 1980s. Short-term traffic flow forecasting predicts traffic from several minutes to hours in the long run, using present and historical data. In recent times, extensive work has been carried out on this issue. Existing traffic flow prediction techniques can be broadly divided into two categories: the parametric model and the non-parametric model.

The Auto-Regressive Integrated Moving Average (ARIMA) framework was developed in the late 1970s, which proposes to forecast precise road transport flow Seasonal Auto-Regressive Integrated Moving Average (SARIMA) multivariate time series models and spectral analysis. Kumar et al. [

Voort et al. [

Zhang et al. [

Wei et al. [

Farahani et al. [

Polson et al. [

Zheng et al. [

Shu et al. [

Therefore in this section, we build a PALKNN method for analyzing the short-term traffic flow prediction exhibited in below

PCA is an unsupervised algorithm used to minimize the attribute strategy. It cleans up data sets so they can be explored and analyzed more easily. A data set has many attributes that refer to the dimensions of the data set. Parameter removal is the technique of reducing attribute space by removing features. The drawback would be that the data related to the variables are lost. Insignificant attributes have been removed from the data set by PCA. PCA approach combines the concepts of the covariance matrix, eigenvectors, and eigenvalues. We must decide which attributes from the dataset to retain with further analysis; because each eigenvalue roughly represents the significance of its correlating eigenvector, the variance proportion revealed is the sum of the eigenvalues of the attribute divided by the sum of eigenvalues of all features

X-denotes n * n parent square,

A-denotes the eigenvector of the matrix,

SAE is a kind of unsupervised model. The SAE reduces the data dimension by retaining attributes to reconstruct the data using one hot encoding (OHE). OHE is a binary vector description of categorical variables. OHE is the process of reducing binary categorical data into N columns of binary 0’s and 1’s with ‘N’ distinct groups, where 1 in the ‘N’th category implies the assessment relates to that category using the Scikit-Learn. OHE has an encoding function that begins with the shape of the input data. The decoding layer then takes that embedding and expands it back to the original shape. Finally, we bring the reconstruction from the decoder and calculate the reconstruction’s loss

Y-denotes the original input

Y^{’}-denotes the reconstructed input

W-denotes the weight

b–denotes the bias

LSTM endures a Recurrent Neural Network regularly used in Artificial Intelligence Applications for analyzing and predicting time series data. However, LSTM assists in resolving Vanishing Gradient or Exploding Gradient errors, and also LSTM helps mitigate the Past Information Carry Forward problem. LSTM is trained through time using Backpropagation that resolves the vanishing gradient problem. The primary objective of LSTM cells is to remember the sequence components while ignoring the less crucial components. LSTM networks use memory blocks linked together in layers rather than neurons. A block contains components that enable it to be more intelligent than a traditional neuron and to remember recent sequences. A block includes gates that regulate the block’s state and output. A block performs on input sequence; every gate inside a block utilizes the sigmoid activation units to regulate whether it is triggered or not, making the deviation of state and addition of data exchange via block constraint. There are three kinds of gates, namely Forget Gate, Input Gate, and Output Gate, defined below sections [

Forget Gate endures in charge of determining the information from the previous phase should be discarded. _{t-1} according to h_{t-1} (initial hidden state) and x_{t} (present input at time-step t). The content has been maintained for all 1 s, the complete range is discarded for all 0 s, and other values determine more input from the previous state should be passed over to the next state.

The input gate layer determines the updated values. A tanh component subsequently generates variables regarding future prospective principles in _{t-1} towards the modern cell state C_{t}. Increasing the initial state besides xt in

tanh

LSTM memory is the state of cells. While working on longer sequences of input, it exceeds vanilla Recurrent Neural Networks throughout this state. For every earlier time, the phase cell state C_{t-1} combines with the forget gate to determine the input to move forward, which further interacts with the input gate by creating the storage or advanced cell state.

The output gate depends on the cell state in

KNN is a supervised machine learning algorithm that can perform both regression and classification problems. KNN is applied to predict outcomes on the test data according to the properties of the existing training data points. For example, the distance between the testing and training data is calculated. The KNN algorithm will discover its properties/attributes when adding a new data point. It will then move the new data point closer to the existing training data points that share the same features. To classify new data points, KNN computes the distance between them. Euclidean is the most commonly used technique for calculating this distance [

The times-series data frame to LSTM is then transformed using the time-series data structure frame considered in the proposed model for LSTM. This data frame was prepared after removing all the unwanted features in the UK traffic dataset by PCA technique which is an automation process, unlike manual, and after converting the data into one hot representation by SAE. Generally, Dt_{1} will be the one hot representation of all_motor_vehicles at the t_{1} time frame. Such ten one-hot representation of all_motor_vehicles at different sequential time frames sorted based on date, time, and travel direction were considered in ascending order to derive the Dt_{11}. Times-series data frame to LSTM is hereof in

Input | Output |
---|---|

D_{t1}, D_{t2}, D_{t3}…D_{t10} |
D_{t11} |

D_{t2}, D_{t3}, D_{t4}…D_{t10}, D_{t11} |
D_{t12} |

D_{t3}, D_{t4}, D_{t5}…D_{t10}, D_{T11}, D_{t12} |
Dt_{13} |

Dt_{N} is the traffic data at the N^{th} time frame where N > 0.

Pseudo Code of PALKNN

-----------------------------------------------------------------------------------------------------------------------

Input: UK Traffic dataset.

Output: Traffic flow

-----------------------------------------------------------------------------------------------------------------------

Load the UK traffic Dataset [From 2000–2020]

Apply Data Preprocess activities like

2.1 Remove outliers if exists.

2.2 Transform direction_of_travel column values as E-1, W-2, N-3 and S-4. Count_date column as yyyy-mm-dd.

Input Data Preparation:

Add StandardScalarization transform technique with a range between −1 and 1.

Add PCA instance with retaining ability as 99%.

Add reverse StandardScalarization.

StackedAutoEncoder:

6.1 Add 2 Encoding Layers with 3 Neurons

6.2 Add 2 Decoding Layers with 3 Neurons.

NOTE: 3 Neurons because we have considered an only date, direction, all_motor_vehicles.

Sort the dataset based on count_date, hour, and direction_of_travel columns.

Considering only the all_motor_vehicles column to prepare the time series (Sequential) dataset,

LSTM Times-series Data [

Model Construction:

Create a Sequential model with

LSTM:

10.1 Add 50 Hidden layers with neurons as 36.

10.2 Add 3 Dense layers with neurons as 50 5 respectively.

10.3 Add a single flatten layer.

KNN:

10.3 Add one output layer with a single neuron and K = 5, where K is the Nearest Neighbors.

10.4 Add Euclidean distance.

Fit or Train the Model with input data for 275 epochs and capture the loss.

Predict traffic flow.

-------------------------------------------------------------------------------------------------------------------------

The parameter configuration of PCA, SAE, LSTM, and KNN are shown below in

S.no. | Summary | Values |
---|---|---|

1 | Total input parameters | 1 (All motor vehicles) 2D array |

2 | Total output parameters | 1 (2 dimensional array) |

3 | Retain functionality | 95 percent |

S.no. | Summary | Values |
---|---|---|

1 | Total input parameters | 1 (All motor vehicles) 2D array |

2 | Total output parameters | 1 (2 dimensional array) |

3 | Number of encoder layers | 2 |

4 | Number of decoder layers | 2 |

5 | Batch size (Records loaded to train) | 256 |

6 | Number of epoch (Forward + Backward) | 500 |

7 | Loss | Binary-cross-entropy |

S.no. | Summary | Values |
---|---|---|

1 | Total input parameters | 1 (3 dimension array) |

2 | Total output parameters | 1 (2 dimension array) |

3 | Total hidden layers | 50 |

4 | Neurons in each Hidden layer | 36 |

5 | Batch size (Records loaded to train) | 256 |

6 | Number of epoch (Forward + backward propagation) | 300 (Efficient is 275) |

7 | Activation function | Soft max |

8 | Loss function | Mean Absolute Error (MAE) |

9 | Number of dense layers | 3 |

10 | Number of neurons in dense layers | 50, 25, successively |

11 | Number of flatten layers | 1 |

S.no. | Summary | Values |
---|---|---|

1 | Total input parameters | 1 (2 dimension array) |

2 | Total output parameters | 1 (1 dimension array) |

3 | Batch size (Records loaded to train) | 256 |

4 | Number of epoch (Forward + backward propagation) | 300 (Efficient is 275) |

5 | Activation function | Sigmoid |

6 | Loss function | Mean Absolute Error (MAE) |

7 | Number of neurons in output layers | 1 |

8 | K nearest neighbors | 5 |

9 | Distance | Euclidean distance |

In this work, the traffic flow dataset was collected from zone 1 to zone 5 Department of Transport, Great Britain, United Kingdom (UK) [

In this experiment, the data set was divided into two parts with ratios of 75% for the training dataset and 25% for the testing dataset. In addition, there is a feature or column called all_motor_vehicles. This feature comprises a count of pedal cycles, two-wheeled motor vehicles, cars, taxis, buses, coaches, Light Goods Vans, two-rigid axle Heavy Goods Vans, three-rigid axle Heavy Goods Vans, four or more rigid axle Heavy Goods Vans, articulated axle Heavy Goods Vans of all weekdays and week off during both busy, quite periods.

However, at the initial stage, the proposed PALKNN model involves data pre-processing to standardize the input data using the standardschalarization approach from the scikit library. Standardization is a scaling technique that centers values on the mean with a unit standard deviation; this signifies that the attribute’s mean becomes zero and the resulting distribution has a unit standard deviation. In this work, the prior traffic flow of an hour, i.e., a PALKNN time series data is considered. PALKNN time series data used for training and testing in this intent.

The performance evaluation of STTF models tested with the work using three indexes; Root Mean Square Error (RMSE), shown in

To authenticate the validity of the proposed model, contrast it with five models: the KNN model, SARIMA model, Logistic Regression model, Recurrent Neural Network (RNN) model, and LSTM model using the UK traffic dataset. The error performance of training and testing models is shown below in

Training | Testing | |||||
---|---|---|---|---|---|---|

Algorithm | RMSE | MSE | MAE | RMSE | MSE | MAE |

KNN | 38.2 | 33.95 | 30.74 | 39.87 | 37.61 | 40.9 |

SARIMA | 34.9 | 30.9 | 28.9 | 36.91 | 34.08 | 33.74 |

Logistic regression | 14.34 | 11.34 | 9.12 | 17.41 | 14.53 | 15.03 |

RNN | 10.45 | 8.45 | 7.15 | 13.84 | 13.29 | 9.64 |

LSTM | 5.3 | 3.3 | 2.86 | 6.04 | 5.98 | 4.05 |

PALKNN✓ | 1.9 | 3.5 | 1.8 | 2.07 | 4.1 | 2.04 |

After segregating the UK traffic dataset into training and testing data sets with a ratio of 75% and 25%, respectively, the proposed model was trained on a training data set and tested on both training and testing data sets; the loss rate in the testing data set reduced from 60% to 5% and the training data set from 53% to 5% as the epochs (training loops) increased. There were four convergences between training loss and testing loss at epoch 69, 100, 150, and 270, respectively, with loss rates of 52%, 31%, 30%, and 7%. Later, training loss reached 4%, and testing loss reached 6% before possible divergence began. The proposed model loss is around 6%, which is shown below in

The assessment between predicted traffic and actual traffic per hour of a day is shown in below

The estimation of predicted traffic with different models such as PALKNN, LSTM, RNN, Logistic Regression, SARIMA, and KNN is shown below in

Similarly, while training the proposed hybrid model, PALKNN achieved 3.5%, 1.8%, and 1.9% error values for RMSE, MSE, and MAE, whereas LSTM is 4%, and other algorithms are above 5%. Also, the proposed technique has a 3.5% MSE error value but all peer competitive techniques possess above 5%. So we concluded that the PALKNN hybrid model has less error rate in training shown in above

Likewise, testing the proposed hybrid model PALKNN has negligible RMSE, MSE, and MAE error values as 4.1%, 2.04%, and 2.07%, respectively. However, for other techniques, it is above 4%. Even in testing, we have identified that the proposed algorithm is performing better with less error rate shown below in

There is no accuracy measuring metrics for regressor techniques, unlike classifiers, because in classifiers, we have defined several outputs (like True or False, Yes or No, 0 or 1, high or Low). In contrast, in regressor techniques, the output will be a real number ranging from −ve ∞ to +ve ∞ ∞ (Negative Infinity to Positive Infinity). Due to this output predicting the behavior of regressor techniques, we use the error rates to assess the performance. So if the error ratio is between 1% and 25%, then the regressor model would be the best fit for the business problem statement for the chosen data set.

In this research, the proposed model is a Regressor because the output layer contains a KNN regressor to predict the traffic flow value. We concluded that the proposed model is the best fit for the short-term traffic flow prediction because RMSE, MSE, and MAE ratios are less than 5% in both training and testing data sets, though other models values are less than 25% shown in the above

This work proposes an STTF prediction model PALKNN based on ensemble learning of PCA, SAE, LSTM, and KNN. PCA selects important features from the dataset. SAE performs dimensionality reduction using one-hot representation, converting the data into 0 and 1 s. So PCA and SAE prepare well-defined and highly visible input data to LSTM for faster execution; now LSTM takes sorted input in ascending order and generates the derived values. The KNN takes these derived values to predict traffic flow values. Finally, the proposed hybrid model PALKNN internally uses the optimizer called Euclidean Distance (which is the mean of K nearest neighbors, where k = 5) present in KNN to improve the performance by reducing the error rate. The research approach mainly focuses on the STTF prediction for every hour of a day, which is more efficient for commuters to plan their journeys. Finally, the performance of the PALKNN prediction model was validated using an actual data set from the road traffic statistics, UK. The metric findings below show that the proposed algorithm outperformed existing approaches such as KNN, SARIMA, Logistic Regression, RNN, and LSTM models.

Traffic prediction is more reliable and very similar to the actual value, with quicker execution when corresponding with other models.

Fewer error rates for RMSE, MSE, and MAE in testing data with values as 2.07%, 4.1%, and 2.04%, respectively.

Fewer error rates for RMSE, MSE, and MAE in training data with values as 1.9%, 3.5%, and 1.8%, respectively.

In future work, to make the traffic flow prediction more stable, the proposed PALKNN Regressor algorithm will be replaced with the PALKNN Classifier (both LSTM and KNN as classifiers) having two class labels such as HIGH and LOW traffics.

The authors received no specific funding for this study.

The authors declare that they have no conflicts of interest to report regarding the present study.