This research is focused on a highly effective and untapped feature called gammatone frequency cepstral coefficients (GFCC) for the detection of COVID-19 by using the nature-inspired meta-heuristic algorithm of deer hunting optimization and artificial neural network (DHO-ANN). The noisy crowdsourced cough datasets were collected from the public domain. This research work claimed that the GFCC yielded better results in terms of COVID-19 detection as compared to the widely used Mel-frequency cepstral coefficient in noisy crowdsourced speech corpora. The proposed algorithm's performance for detecting COVID-19 disease is rigorously validated using statistical measures, F1 score, confusion matrix, specificity, and sensitivity parameters. Besides, it is found that the proposed algorithm using GFCC performs well in terms of detecting the COVID-19 disease from the noisy crowdsourced cough dataset, COUGHVID. Moreover, the proposed algorithm and undertaken feature parameters have improved the detection of COVID-19 by 5% compared to the existing methods.

COVID-19 disease had spread like wildfire around the world after December 2019. This life-threatening COVID-19 disease was declared a global pandemic, and many researchers have carried out the detection of coronavirus from the infected people by using Artificial Intelligence (AI) based Machine Learning (ML) techniques [

Grant et al. [

Miranda et al. [

This proposed work emphasized on detection of COVID-19 in a noisy crowdsource cough dataset. Noise is an integral and unavoidable unit during the collection of crowdsource speech data for future usage in COVID-19 analysis through AI and ML techniques. However, the performance of the AI-based machine learning techniques reduces greatly in the presence of a noisy data set. The focus was on noise robustness speech features called Gammatone Cepstral Coefficients (GFCCs), which are untapped feature parameters for detecting COVID-19 infection. The untouched meta-heuristic-based machine learning algorithms were applied to detect COVID-19 infection using GFCCs feature vectors in the noisy public domain crowdsource cough dataset.

The research objectives are:

To analyze the GFCC features to detect COVID-19 infection in a noisy cough dataset and compare the results with most features applied in COVID-19 detection.

To apply different untouched metaheuristic algorithms for the detection of COVID-19 from noisy crowdsourced cough data and compare their performances.

To propose a hybrid algorithm combining nature-inspired metaheuristic algorithms based on DHO and neural networks.

To perform a rigorous validation of the proposed method and undertaken feature vectors.

The remaining manuscript is designed: Section 2 discusses the crowdsourced cough data set and feature parameters selection. Section 3 emphasizes nature-inspired meta-heuristic-based ANN. Section 4 analyses the findings of this research work, and Section 5 focuses on the conclusion and future work.

Each WEBM OR OGG file of the COUGHVID crowdsourced dataset was converted into WAV files, resampled into 16 kHz, and set into monotype.

Feature extraction plays a crucial role in detecting COVID-19 infection from noisy cough data using AI-based machine learning techniques. It is found in a survey made by Deshpande et al. [

Pass each COUGHVID crowdsource cough sound as an input signal through a 64-channel gammatone filterbank.

At each channel, the filter response is rectified and decimated to 100 Hz as a way of time windowing. Absolute value is taken afterward, creating a variant of the cochleagram in terms of time-frequency (T-F) representation.

Apply the cubic root on the output of the previous step, i.e., T-F representation.

Finally, DCT is applied to derive the cepstral features.

The filter is defined in the time domain by using impulse response, h(t), as in

where

GFCCs are based on equivalent rectangular bandwidth (ERB) frequency scale and human data on ERB of the auditory filter with a function as in

Glasburg et al. [

where

The main purpose of deer hunting, a nature-inspired meta-heuristic algorithm, is to find the optimal position for the hunters to attack the deer by studying the behavior of the deer [

Each individual in the population attempts to reach the goal point (best position) by updating the position in any random location within the space as the following

where _{max}

The search space of the hunter is enhanced by incorporating the visualization angle of deer for a highly effective attack on the prey as

The position angle,

where,

Now, the position is updated by using the position angle as

The position is also updated on the successor position instead of the leader position as per the following

The above equation is suitable for the vector L value greater than 1.

The algorithm for deer hunting optimization works as follows:

Now, it was focused on the blending of nature-inspired meta-heuristic DHO algorithm with neural network algorithm for better accuracy in COVID-19 detection instead of traditional approaches [

The values of weights and biases in artificial neural networks play a key role in the training phase of networks [

where “M” represents the number of outputs, “N” represents the number of training samples, and ^{th} input unit when the k^{th} training sample is used. The lower value of MSE represents a better model.

The particle swarm optimization (PSO) technique is categorized in bio-inspired algorithms, which are used to search for an optimal solution in the solution space [

DHO-ANN algorithm to detect the COVID-19 infection is shown in flowchart

DHO-ANN, nature-inspired algorithm is having dependent parameters and those parameters are set to values that can improve the performance of detection of COVID-19 infection in noisy crowd sourced cough data set. Moreover, the dependent parameters also play crucial role in the stability, convergence, and robustness of DHO-ANN algorithm.

The performance of modified ANN based on deer hunting optimization, a nature-inspired meta-heuristic algorithm in terms of statistical analysis, F1 score over widely used feature parameter, MFCC, and noise-robust feature parameters, GFCC. Further, the proposed approach was compared with other existing models.

Feature | MFCC | |||
---|---|---|---|---|

Model | DHO-ANN | PSO-ANN | MVO-ANN | CS-ANN |

F1 SCORE | 0.80 | 0.66 | 0.56 | 0.46 |

Feature | GFCC | |||
---|---|---|---|---|

Model | DHO-ANN | PSO-ANN | MVO-ANN | CS-ANN |

F1 SCORE | 0.91 | 0.77 | 0.72 | 0.69 |

It is observed from

Further, the performance of the proposed algorithm is also analyzed over MFCC, and GFCC features in terms of confusion matrices are represented in

PREDICTED → | Negative | Positive |
---|---|---|

ACTUAL ↓ | ||

Negative (900) | TN = 736 | FP = 164 |

Positive (810) | FN = 157 | TP = 653 |

PREDICTED → | Negative | Positive |
---|---|---|

ACTUAL ↓ | ||

Negative (900) | TN = 816 | FP = 84 |

Positive (810) | FN = 69 | TP = 741 |

The strength of undertaken cepstral coefficients was measured in MFCC and GFCC by using a statistical measure called correlation. The correlation graph of MFCC and GFCC is shown in

It is observed from

The DHO-ANN algorithm was tested over highly correlated GFCC features from the noisy cough dataset using different iteration ranges. It is observed that the DHO-ANN algorithm results in smoother convergences between 60 to 100 iteration value ranges, as shown in

Further, the evaluation explored the ability of DHO-ANN to detect COVID-19 from noisy cough data set in terms of specificity and sensitivity parameters over two different feature parameters shown in

Feature parameters | Specificity | Sensitivity |
---|---|---|

MFCC | 0.82 | 0.81 |

GFCC | 0.90 | 0.91 |

A comparison was made of the proposed algorithm for detecting COVID-19 with other existing models for the same purpose, shown in

Reference | Sound Type | Features | No. of features | Classifier | AUC-ROC(%) |
---|---|---|---|---|---|

Grant [ |
Cough | MFCC + Deltas | 60 | DNN | 68.36 |

Stasak [ |
Vowel sound | Prosodic | 6 | Decision tree | 80 |

Dash [ |
Cough | C-19CC | 13 | SVM | 85.57 |

Sharma [ |
Heavy cough | Temporal |
9 | Random forest | 76 |

Proposed technique | Crowd source cough | GFCC | 13 | DHO-ANN | 91 |

It is observed in this research work that GFCCs performed better results as compared to MFCCs feature parameters for detecting COVID-19 in noisy crowdsourced cough data. Furthermore, it was obvious from the experimental results that the GFCCs were highly correlated feature parameters for COVID-19 detection rather than MFCCs. The proposed algorithm, DHO-ANN, yielded better results than other Machine Learning models such as PSO-ANN, MVO-ANN, and CS-ANN. Moreover, DHO-ANN performed comparatively superior by using GFCC features from noisy crowdsourced cough data set for the detection of COVID-19 and yielded 0.914 AUC-ROC values. It was found from the confusion matrix that the proposed algorithm yielded pretty better result over GFCC feature parameters as 91.05%. The proposed algorithm’s performance for detecting COVID-19 disease was rigorously validated using statistical measures, F1 score, confusion matrix, specificity, and sensitivity parameters. The specificity and sensitivity of DHO-ANN algorithm were found as 0.90 and 0.91 respectively for detection of COVID-19 from noisy cough data set over GFCC feature parameters. Moreover, it was evident that the proposed algorithm using GFCC performed superiorly in detecting the COVID-19 disease from the noisy crowdsourced cough dataset, COUGHVID. The proposed technique with undertaken feature parameters improved the detection of COVID-19 by at least 5% compared to other existing methods and feature parameters. As a future work direction, the focus may be on the impact of GFCC on detecting the COVID-19 disease under the noisy crowdsourced cough dataset using the nature-inspired meta-heuristic based Deep Neural Network (DNN) and convolutional auto-encoder neural network.

The authors received no specific funding for this study.

The authors declare they have no conflicts of interest to report regarding the present study.