For a 5G wireless communication system, a convolutional deep neural network (CNN) is employed to synthesize a robust channel state estimator (CSE). The proposed CSE extracts channel information from transmit-and-receive pairs through offline training to estimate the channel state information. Also, it utilizes pilots to offer more helpful information about the communication channel. The proposed CNN-CSE performance is compared with previously published results for Bidirectional/long short-term memory (BiLSTM/LSTM) NNs-based CSEs. The CNN-CSE achieves outstanding performance using sufficient pilots only and loses its functionality at limited pilots compared with BiLSTM and LSTM-based estimators. Using three different loss function-based classification layers and the Adam optimization algorithm, a comparative study was conducted to assess the performance of the presented DNNs-based CSEs. The BiLSTM-CSE outperforms LSTM, CNN, conventional least squares (LS), and minimum mean square error (MMSE) CSEs. In addition, the computational and learning time complexities for DNN-CSEs are provided. These estimators are promising for 5G and future communication systems because they can analyze large amounts of data, discover statistical dependencies, learn correlations between features, and generalize the gotten knowledge.

In the subsequent years, it is expected that exponential growth in wireless throughput for different wireless services will continue. 5G communication systems have been designed to meet the vast increase in data traffic and achieve robust communications under the conditions of non-stationary channel statistics. 5G-orthogonal frequency division multiplexing (OFDM) communication systems have been deployed to mitigate the frequency selective fading effects and other channels imperfections, resulting in offering more reliable communication systems and increasing the spectrum efficiency significantly [

One of the most important factors affecting the effectiveness of CSEs is a priori information about the wireless communication channel environment provided by pilots. Both the transmitter and receiver should know the pilot signals to estimate the communication channel information and efficiently recover the desired signal. Channel state estimate is worthless if no or inadequate a priori knowledge is available (no or limited pilots) [

Least squares (LS) estimation is well-known among conventional channel state estimators has a low computational cost since it requires no prior channel information. On the other hand, the LS estimator produces substantial errors of channel estimation in real applications, particularly for multipath channels. Minimum mean square error (MMSE) CSE, grants the best channel information estimation compared with LS CSE [

Deep learning neural networks-based wireless communication applications recently received a lot of attention, such as coding and decoding, automatic signals classification, MIMO detection, and channel estimation [

In [

In this work, we expand our preceding research work [

The current study will propose a CNN DNN-based CSE for OFDM networks. This is the first time a CNN network has been used as a CSE without the use of BiLSTM or LSTM recurrent DNNs. The previous certainty of channel statistics is not required for the CNN-based CSE. The performance of the recent CNN estimator will be compared with previously developed LSTM and BiLSTM estimators, and of course, the conventional estimators in terms of symbol error rate. The comparative study between the three CNN-, BiLSTM-, and LSTM-based CSIEs will be conducted using adaptive moment estimation (Adam) optimization algorithm and three classification layers. One of the three classification layers is built using the most common loss function crossentropyex, while the other two classification layers are that proposed in [

The rest of this paper is organised as follows. The recurrent (LSTM and BiLSTM) DNN-based CSEs are illustrated in the

A recurrent BiLSTM-based CSE is provided in this part. The BiLSTM network is a variant of recurrent LSTM NN, which can learn long-term relationships between input data time steps [

The BiLSTM architecture comprises two distinct LSTM NNs with forward and backward information propagation directions. The LSTM architecture comprises input, forget, and output, gates as well as a memory cell. LSTM NN can successfully store long-term memory because to the forget and input gates. The primary structure of the LSTM cell is depicted in

The output gate determines present cell output

The output unit receives both the forward and backward propagation information simultaneously. As a result, as illustrated in

The constructed BiLSTM-based CSE uses a cross-entropy function for the kth mutually exclusive class (crossentropyex)-based classification layer as the last CSE layer. In addition, authors have been developed a mean absolute error (MAE)-based classification layer and a sum of squared errors (SSE)-based classification layer [

An array with the following five layers is used to construct the BiLSTM NN-based channel state estimator: sequence input, BiLSTM, fully connected, softmax, and output classification. The maximum input size was set at 256. The BiLSTM layer comprises 16 hidden units and displays the last element of the sequence. The size four fully connected (FC) layer, followed by a softmax layer, and finally a classification layer, specifies four classes. The construction of the suggested BiLSTM and LSTM estimators is seen in

CNN architectures have been suggested for image denoising techniques and have received significant attention in the image processing field. CNN architectures may learn how to map noisy images to clean images [

In convolutional layers, there are several convolution filters to process the received signal. Assume that the ^{th} CL of the CNN-based CSE, ^{th} layer. The convolution process in the ^{th} layer is formulated as follows:

^{th}_filter is denoted by

Deep network training may be made faster by lowering the internal covariate shift, which is achieved via the BN layer. In the context of layer training, the internal covariate shift is defined as the change in the distribution of output from each layer. Most of the time, imbalanced nonlinear mapping is to blame for the changes. Frequently the BN layer is exploited before the activation function when it is suggested.

The activation function serves as a way to map outputs in a nonlinear fashion. Sigmoid and tanh are the most common use activation function. In this paper the rectified linear unit (ReLU) is adopted as the activation function, it can be defined as

In CNN, the pooling layer is an essential sort of layer. The convolutional layer uses many convolutions to generate a series of outputs, each of which is run using the (ReLU) function. The layer’s output is then further modified using a pooling mechanism. A pooling function substitutes a net’s output at a particular position with a summary statistic of neighboring outputs. In this research, max pooling is employed, which reports the highest output inside a pooling window.

A fully connected (FC) layer multiplies the input by a weight matrix and then adds a bias vector. In this paper, the convolutional layer is followed by one FC layer. In a FC layer, all neurons are connected to all the neurons in the previous layer. This layer combines all the features learned by the previous layers across the transmission channel to estimate the channel information. For channel state estimation problems, the last FC layer combines the features to estimate the information state of a particular channel.

A softmax layer applies a softmax function to the input. The softmax function is the output unit activation function after the last FC layer for multi-class classification problems. It is also known as the normalized exponential and can be considered the multi-class generalization of the logistic sigmoid function.

For typical classifiers, the classification layer generally comes after the softmax layer. During the training process, in the classification layer, the optimization algorithm receives the outputs from the softmax function and assigns each input to one of the K mutually exclusive classes using the crossentropy function for a 1-of-K coding scheme.

The next subsections describe the conventional OFDM wireless communication technology as well as deep offline learning of the proposed channel state estimators.

A serial-to-parallel (S/P) converter is utilized on the transmitter side to transform the transmitted symbols with pilot signals into parallel data streams. Then, inverse discrete Fourier transform (IDFT) transforms the signal into the time domain. Finally, to mitigate the effects of inter-symbol interference (ISI), a cyclic prefix (CP) must be inserted. The CP length must be greater than the channel’s maximum spreading delay.

A multipath channel formed by complicated random variables in a sample space

In the frequency domain, the received signal may be described as

The pilot symbols from the first OFDM block are included in the OFDM frame and the transmitted data from the subsequent blocks. The channel may appear stationary during one frame, yet it can shift between frames. The offered BiLSTM-based CSE accepts data at its input and recovers it at its output [

In the realm of wireless communication, DNNs are the state-of-the-art technique, but they have high computational complexity and a lengthy training period. The most dominant devices for training DNNs are GPUs [

The learning dataset for one subcarrier is randomly generated during offline training. Through the adopted channel, the transmitting end sends OFDM frames to the receiving end. The received signal is retrieved using transmitted frames that have been exposed to various channel defects.

Traditional CSEs are strongly reliant on theoretical channel models that are linear, stable, and follow Gaussian statistics. However, existing wireless systems contain other flaws and unidentified circumferent effects that are difficult to account for with precise channel models; as a result, researchers have created several channel models that accurately characterize practical channel statistics. Modelling may produce trustworthy and practical training datasets utilizing these channel models [

In The 3GPP-5G TR 38.901 version 16.1.0 Release 16 channel model [

Adam optimization trains the proposed CSEs by minimizing a specific loss function. The difference between the estimator’s reactions and the originally sent data is defined as a loss function. A variety of functions can represent the loss function. The loss function is an indispensable part of the classification layer. The frequently used classification layer is mainly based on the crossentropyex loss function in MATLAB/software. Two more classification layers employ (MAE and SSE) loss functions were established in this study. The suggested estimators’ performance is studied when three classification layers are used. The used loss functions can be defined as follows:

In the offline training phase, the initially created DNN-based CSE takes the training data as pairs. Each pair consists of a specific input (transmitted data

Many simulations are carried out to assess the performance of the suggested estimators. The proposed DNN-based (CNN, LSTM [

The properties of the provided DNNs-based CSEs, as well as their training settings, are listed in

Parameter | Value |
---|---|

Input layer size | 256 |

BiLSTM layer size | 30 H. N. |

LSTM layer size | 30 H. N. |

CNN layers sizes | 32, and 175 H. N. |

F. C. layer size | 4 |

Classification layers | Based on Crossentropyex, MAE, and SSE loss functions |

Min. batch size | 1000 |

Epochs № | 1000 |

Iterations № | 8000 |

Optimization algorithm | Adam |

Parameter | Value |
---|---|

Modulation scheme | QPSK |

C. Freq. | 2.6 GHz |

Paths № | 24 |

CP length | 16 |

Subcarrier № | 64 |

Pilots № | 64, 8 and 4 |

The CNN architecture consists of two 2D-convolution layers (2D-CL) and a single FC layer. First, 2D-CL is followed by a batch normalization (BN) layer, the rectified linear unit (ReLU) activation layer, and max-pooling layer. The second 2D-CL is followed by a BN layer, ReLU layer, and F. C. layer with size 4. The softmax activation is in the output layer. CNN’s training options such as loss functions-based classification layers, mini-batch size, epoch number, learning algorithm are the same as in

The performance of the analyzed estimators is assessed using crossentropyex, MAE, and SSE-based classification layers and 4, 8, and 64 pilots. For all simulation investigations, the Adam optimization algorithm is employed.

Using the crossentropyex-based classification layer and enough pilots (64), the developed BiLSTM_{(crossentropyex)} CSE beats LSTM_{(crossentropyex)}, CNN_{(crossentropyex)}, and conventional CSEs throughout the whole SNRs, as depicted in

_{MAE} outperforms both BiLSTM_{(MAE)}, and LSTM_{(MAE)}. The BiLSTM_{(MAE)}, and LSTM_{(MAE)} CSEs outperform the LS CSE throughout the SNRs of [0–18 dB], and [0–15 dB], respectively. The BiLSTM_{(MAE)}, LSTM_{(MAE)}, and CNN_{(MAE)} estimators are comparable to the MMSE CSE throughout the SNRs of [0–10 dB], [0–4 dB], and [0–12 dB], respectively. The MMSE CSE outperforms the others outside these SNRs.

When the same number of pilots are used and the SSE-based classification layer is employed, _{(SSE)}, LSTM_{(SSE)}, CNN_{(SSE)}, and MMSE CSEs are on par at low SNRs [0–7 dB]. Also, the MMSE CSE outstrips both BiLSTM_{(SSE)} and LSTM_{(SSE)} CSEs from 8 dB. The LS CSE beats LSTM_{(SSE)}, BiLSTM_{(SSE)}, and CNN_{(SSE)} starting from 13, 15, and 18 dB, respectively. BiLSTM_{(SSE)} outperforms LSTM_{(SSE)} starting from 9 dB, while CNN_{(SSE)} outperforms both BiLSTM_{(SSE)} and LSTM_{(SSE)} starting from 8 dB.

Concisely, at pilots (64), BiLSTM_{(crossentropyex)} outperforms all examined MMSE, LSTM_{(crossentropyex)}, LSTM_{(MAE)}, LSTM_{(SSE)}, CNN_{(crossentropyex)}, CNN_{(MAE)}, CNN_{(SSE)}, and LS estimators. Furthermore, at low SNRs to 7 dB, BiLSTM_{(crossentropyex)}, BiLSTM_{(MAE)}, BiLSTM_{(SSE)}, LSTM_{(crossentropyex)}, LSTM_{(MAE)}, LSTM_{(SSE)}, CNN_{(crossentropyex)}, CNN_{(MAE)}, CNN_{(SSE)} and MMSE CSEs achieve similar performances.

As LS does not exploit channel statistics priorly in the estimation phase, it performs poorly compared to MMSE. Conversely, MMSE exhibits superior performance using second-order channel statistics, especially with sufficient pilot numbers.

At 8 pilots, _{(crossentropyex)}, BiLSTM_{(MAE)}, BiLSTM_{(SSE)} CSEs beat LSTM_{(crossentropyex)}, LSTM_{(MAE)}, LSTM_{(SSE)}, CNN_{(crossentropyex)}, CNN_{(MAE)}, CNN_{(SSE)} and the conventional CSEs at examination SNRs. At low SNRs to 7 dB, the presented BiLSTM_{(crossentropyex)}, BiLSTM_{(MAE)}, BiLSTM_{(SSE)} CSEs deliver comparable performance. Furthermore, the developed BiLSTM_{(SSE)} beats the BiLSTM_{(crossentropyex)}, and BiLSTM_{(MAE)} CSEs. Also, it is clear that CNN_{(crossentropyex)}, CNN_{(MAE)}, and CNN_{(SSE)} estimators suffer due to limited pilots and providing such a lousy performance compared to all examined estimators.

At 4 pilots, _{(crossentropyex)}, BiLSTM_{(MAE)}, and BiLSTM_{(SSE)} estimators compared to the examined estimators. They also show the superiority of BiLSTM_{SSE} over BiLSTM_{(crossentropyex)}, BiLSTM_{(MAE)}, LSTM_{(crossentropyex)}, LSTM_{(MAE)}, and LSTM_{(SSE)}. At SNRs to 3 dB, the presented BiLSTM_{(crossentropyex)}, BiLSTM_{(MAE)}, and BiLSTM_{(SSE)} CSEs deliver comparable performance. Also, it is clear that CNN_{(crossentropyex)}, CNN_{(MAE)}, and CNN_{(SSE)} estimators have slightly better performance than the conventional estimators starting from 8 dB, but they are still suffering from the limited pilots.

The given findings highlight the robustness of BiLSTM-based CSEs against a few pilots and priori channel statistics information uncertainty. They also show how important it is to evaluate the use of multiple loss functions-based classification layers on the training process to find the best architecture for each suggested estimator.

_{crossentropyex}, BiLSTM_{SSE}, and BiLSTM_{SSE} CSEs have comparable performance. Also, BiLSTM_{SSE} performance at 8 pilots is identical to BiLSTM_{crossentropyex} performance at 64 pilots. Therefore, 5G OFDM systems should use the suggested estimators with limited pilots that is BiLSTM_{SSE} to considerably improve their data transmission rate. Also, it is clear that some loss functions are preferable in some situations than others. The proposed estimator is robust to a priori uncertainty for channel statistics since it uses a training data set-driven technique.

Exploring the training loss curves helps effectively check the quality of the DLNNs’ training process. The loss curves deliver feedback on how the learning process is going, allowing to determine if it is worthy of continuing with the learning process or not.

The suggested and other evaluated estimators’ accuracy measures how well they accurately retrieve transmitted data. Accuracy is the ratio of the correctly received symbols, and the transmitted symbols. As mentioned in the previous subsection, the suggested estimators are trained in various conditions, and we want to see how well they perform with a new data set. Therefore, the achieved accuracies for investigated CSEs are presented in

Pilots of 64 | |||||
---|---|---|---|---|---|

BiLSTM | LSTM | CNN | MMSE | LS | |

Crossentropyex | 100 | 99.99 | 99.96 | 100 | 99.94 |

SSE | 99.23 | 97.88 | 99.78 | 100 | 99.96 |

MAE | 99.87 | 99.52 | 99.96 | 100 | 99.97 |

Pilots of 8 | |||||
---|---|---|---|---|---|

BiLSTM | LSTM | CNN | MMSE | LS | |

Crossentropyex | 99.84 | 99.53 | 26.21 | 91.34 | 91.62 |

SSE | 100 | 99.95 | 27.20 | 91.60 | 91.49 |

MAE | 100 | 99.94 | 26.44 | 91.53 | 91.50 |

Pilots of 4 | |||||
---|---|---|---|---|---|

BiLSTM | LSTM | CNN | MMSE | LS | |

Crossentropyex | 98.61 | 97.94 | 24.86 | 0.24 | 0.02 |

SSE | 100 | 99.28 | 26.05 | 0.24 | 0.09 |

MAE | 99.97 | 99.05 | 25.52 | 0.26 | 0.04 |

The proposed BiLSTM-based CSE achieves accuracy, as shown in

The RNN (LSTM and BiLSTM)-based CSEs can assess enormous data sets, discover statistical correlations, construct relationships between features, and generalize the gained knowledge for new inputs. As a result, they are applicable to 5G and future systems.

The computational complexity of both CNN-based CSEs and DNN-based CSEs is provided empirically in this section in terms of the training time. Training time can be defined as the amount of time expended to get the best NN parameters (e.g., weights and biases) that will minimise the error using a training dataset. Because it involves continually evaluating the loss function with multiple parameter values, the training procedure is computationally complex.

64 pilots | 8 pilots | 4 pilots | |||||||
---|---|---|---|---|---|---|---|---|---|

Bi-LSTM (M:S) | LSTM (M:S) | CNN (H:M) | Bi-LSTM (M:S) | LSTM (M:S) | CNN (H:M) | Bi-LSTM (M:S) | LSTM (M:S) | CNN (H:M) | |

Crossentropyex | 10:13 | 8:2 | 15.72 | 9:14 | 6:9 | 15.95 | 8:33 | 7:53 | 16:02 |

SSE | 10:48 | 6:57 | 16.54 | 8:18 | 7:40 | 15.37 | 7:43 | 7:11 | 14.95 |

MAE | 10:43 | 6:32 | 17:39 | 9:1 | 7:24 | 16.47 | 7:23 | 7:10 | 16:30 |

The LSTM-based CSEs consume the lowest training time, followed by BiLSTM-based CSEs, while the highest training time is consumed by CNN-based CSEs at all pilots’ scenarios and the same training parameters. CNN-based CSEs’ training time is in hours, which indicates its high computational complexity in comparison to its peers.

All presented DNNs-based CSEs are online pilot assisted estimators. The findings are summarised as follows:

At sufficient pilots = 64:

The proposed CNN_{(crossentropyex)}-based CSEs provide comparable performance to the RNNs_{(crossentropyex)}-based CSEs, as depicted in

The proposed CNN_{(MAE, and SSE)}-based CSEs outperform the RNNs_{(MAE, and SSE)}-based CSEs, as depicted in

The proposed CNN_{(crossentropyex, MAE, SSE)}-based CSEs outperform the traditionally used LS CSE, as depicted in

The proposed CNN_{(MAE, SSE)} provide a comparable performance as MMSE CSE at low SNRs as depicted in

The proposed CNN-based CSEs superior the conventional estimators where the last first estimate channel state information explicitly and then detect/recover the transmitted symbols using the estimated information, while the proposed CNN-based CSEs estimate channel information implicitly and recover the transmitted symbols directly.

At fewer pilots = 4,

The proposed CNN-based CSEs outperform the LS, and MMSE conventional estimators.

Generally:

RNN-based CSEs win the proposed CNN-based CSEs in terms of training time, and achieved accuracies at pilots = 8, and 4. While they provide approximately the accuracies at pilots = 64.

The best loss function is SSE (SSE-based classification layer), and the best RNN structure is BiLSTM_{(SSE)}, as illustrated in

Some of loss functions are preferable in some situations than others.

The proposed CSEs are more suitable for communication systems with modeling errors or non-stationary channels, such as high-mobility vehicular systems and underwater acoustic communication systems.

Using other learning techniques such as Adadelta, Nadam and Adagrad to investigate the proposed estimators’ performance and accuracy. Using m-estimators robust statistics cost functions such as Huber, Tukey, Welch, and Cauchy to develop more robust classification layers. Developing other DNN-based CSEs by employing other recurrent networks such as gated recurrent unit (RGU) and simple recurrent unit (SRU). Studying the effectiveness of these CSEs using crossentropyex-, MAE-, and SSE-based classification layers.