^{#}Junding Sun & Xiang Li contributed equally to this paper, and should be considered as cofirst authors
The novel coronavirus is the cause of COVID19, which is highly infectious. The main symptoms of this disease are fever, dry cough, fatigue, etc. [
The current diagnostic methods of COVID19 are: (i) nucleic acid testing. This method needs a long time from sample collection to publish test results, which requires high requirements on the sample testing environment. In addition, samples are vulnerable to contamination; hence, this method has the possibility of sampling failure, which needs to repeat the collection operation [
At present, scholars in mounting numbers used deep learning technology in the diagnosis COVID19. Xu et al. [
Narin et al. [
To solve above problems, our team proposed a novel framework and three methods to diagnose COVID19 based on chest Xray images. In this paper, the contributions of this study are listed as follows:
A novel framework—Biogeographybased optimization ExpertVGG (BEVGG) is proposed.
Three novel methods—BEVGGCI, BEVGGCII, and BEVGGCIII based on the BEVGG framework are proposed.
We find our three methods are superior to stateoftheart methods, and BEVGGCI has the best performance of all methods.
The experimental dataset used in this paper is a public dataset from the Kaggle website [
According to
To ease the understanding of this paper,
Biogeographybased optimization (BBO) is an efficient optimization algorithm, suitable for solving highdimensional and multiobjective optimization problems [
The process of BBO optimization of a problem can be divided into two stages: migration and mutation. The migration stage also can be divided into two operations: immigration and emigration. When the algorithm performs the migration operation, it determines the immigration SIV and the emigration SIV according to the migration rate [
Here,
Both migration and mutation in the BBO are to improve the diversity of the solution to be optimized. The mutation is often performing after migration. At the same time, elitism is introduced to better preserve the optimal solution generated during each iteration. Through the above content, BBO has the advantages of fewer parameters, simple operation and fast convergence speed. It has a good ability to solve the problem of optimizing multiple CNN hyperparameters.
When we used BBO to optimize the hyperparameters of CNN, the corresponding relationship between BBO variables and CNN variables is as follows. SIV represents the value of the hyperparameters to be optimized. The objective function of BBO represents CNN. HSI represents the output of CNN. Habitat represents a set of solutions of all the hyperparameters to be optimized. When we perform the BBO, through continuous iteration, the optimization operation is completed.
In this section, our team proposed a novel framework for the detection COVID19 based on chest Xray images. Which consists of convolution layer, pooling layer and the fully connected layer.
The function of convolution operation is to extract image features on input images. The steps of convolution operation are [
Here,
Therefore, convolution operation has the characteristics of local connection and weight sharing, thus convolution operation can be used to achieve efficient and fast feature extraction of images.
The image information extracted by convolution operation can be directly fed to the fully connected layer. If this is done, the fully connected layer will handle a huge number of calculations and limit the performance of the model. Therefore, we introduced pooling operation to reduce the dimensionality of extracted features and reduce the risk of overfitting [
In the maxpooling operation, the maximum value of each receptive field is selected as the output of the pooling area. Therefore, the maxpooling operation can better preserve the texture information and reduce the deviation of the estimated average value caused by convolution parameter errors.
The calculation formula of output size through pooling operation is shown as follows:
Here,
In the average pooling operation, the average of all values in each receptive field is calculated as the output of the pooling area. Therefore, the average pooling operation can better preserve the overall features of the image, highlight the background information of the image, and reduce the increase in estimation variance caused by the limited neighborhood size.
According to the above, pooling operation can both reduce the number of parameters and the risk of overfitting. It also has translation invariance.
According to the contents of Section 3.2.1, when standard convolution kernel performs convolution operation, all channels of input image in the receptive field of convolution kernel are calculated at the same time. And one convolution kernel can just extract one feature. If we want to extract more features from input images, the number of convolution kernels will be added accordingly.
The advantage of DSC operation is that this convolution operation can significantly reduce the number of parameters in the convolution operation.
The calculated parameters methods of standard convolution and DSC are as follows:
Here,
According to the above contents, the parameters of convolution operation using DSC are less than that of using standard convolution. The more convolution kernels used in models, the parameters of convolution using DSC are less than that of using standard convolution, and the advantages of DSC are more significant.
In Section 3.2.1 and Section 3.3, we have described both the convolution operation of standard convolution and DSC. The receptive field of both of them is continuous. In this part, we introduced a convolution with discontinuous receptive field—dilated convolution [
The calculated method of a single dilated convolution’s receptive field is as follows:
Here,
In Section 3.5, we discussed the three methods (BEVGGCI, BEVGGCII, and BEVGGCIII) proposed by our team based.
In this section, we described three methods proposed by our team (BEVGGCI, BEVGGCII, and BEVGGCIII). The structures of them are shown as
In the structure of BEVGGCI, we set all the convolutional layers to standard convolutional layers. And we used BBO to optimize the standard convolution kernel size and stride size of the first three convolutional layers. The other parameters (weights, biases) of the model are optimized by back propagation simultaneously. BEVGG has five convolution layers. Due to experimental conditions, we choose the hyperparameters in the first three convolution layers for optimization.
In the process of BBO optimizing the BEVGGCI hyperparameters, each convolution kernel size and convolution kernel stride size in the first three convolutional layers as each SIV. By continuously changing the candidate values of each SIV, the array of all SIVs is as a habitat. And the value of the CNN output is as HSI of each habitat. Through continuous performance of iterative operations, the output after BBO optimized is obtained. The following BBO optimizes hyperparameters of models refer to this method.
In the structure of BEVGGCII, we set all the convolutional layers to depthwise separable convolutional layers. And we used BBO to optimize the depthwise separable convolution kernel size and stride size of the first three convolutional layers. The other parameters (weights, biases) of the model are optimized by back propagation simultaneously.
In the structure of BEVGGCIII, we set the first three convolutional layers are dilated convolutional layers, the remaining convolutional layers are depthwise separable convolutional layers. Here, to make the model shows better performance, we determined the dilated rate value is 2 through trial and error. And we also used BBO to optimize the dilated convolution kernel size and stride size of the first three convolutional layers. The other parameters (weights, biases) of the model are optimized by back propagation simultaneously.
To compare the diagnostic performance of different methods, confusion matrix [
Confusion matrix  Predicted class  

Positive  Negative  
Actual class  Positive  TP  FN 
Negative  FP  TN 
Besides, we defined five metrics: Accuracy, Precision, Sensitivity, Specificity and F_{1} scores.
To prove the robustness of the methods, 10 runs 10fold cross validation is introduced. Therefore, we divided the experimental dataset randomly according to the dividing method of 10fold cross validation, and obtained 10 same size new datasets with different data distribution based on the original experimental dataset. In the selection of test set, we select two increasing number subsets as test set, and the remaining 8 subsets as training set. And our experiments are carried out on these datasets.
According to the above contents, an ideal confusion matrix of our experiment is as
Confusion matrix  Predicted class  

COV  Nor  Pne  
Actual class  COV  463  0  0 
Nor  0  463  0  
Pne  0  0  463 
An ideal 10 runs 10fold cross validation confusion matrix is shown as
Confusion matrix  Predicted class  

COV  Nor  Pne  
Actual class  COV  4630  0  0 
Nor  0  4630  0  
Pne  0  0  4630 
To better understand the performance of different methods, Gradientweighted Class Activation Mapping (GradCAM) is introduced. By analyzing the GradCAM result of original chest Xray images of each method, we can more intuitively compare the detection performance between different methods.
In this part, we have compared, analyzed and discussed the experimental results. Besides, we also compared our methods with stateoftheart methods.
In this section, ablation experimental results of BEVGGC are shown to prove the rationality of the structure proposed by our team of BEVGGC (13 layers of the feature extraction structure and 3 layers of the fully connected layer).
In
C1  M1  C2  M2  C3  M3  C4  M4  

Kernel  5  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2 
Channels  3/16  16/32  32/64  64/128  
BEVGGC (82)  FL1  FL2  
Neurons  120  3  
BEVGGC (83)  FL1  FL2  FL3  
Neurons  120  60  3  
BEVGGC (84)  FL1  FL2  FL3  FL4  
Neurons  120  80  40  3 
Confusion matrix  Predicted class (2 layers of FL)  Predicted class (3 layers of FL)  Predicted class (4 layers of FL)  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
Actual class  COV  4456  97  77  4458  90  82  4485  76  69 
Nor  145  4347  138  147  4357  126  158  4325  147  
Pne  76  210  4344  65  225  4340  103  198  4329 
As can be seen from
Accuracy  Precision  Sensitivity  Specificity  F_{1} scores  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
82  97.16  95.75  96.39  95.27  93.40  95.28  96.24  93.89  93.82  97.61  96.68  97.68  95.76  93.64  94.55 
83  97.24  95.77  96.41  95.46  93.26  95.43  96.29  94.10  93.74  97.71  96.60  97.75  95.87  93.68  94.57 
84  97.08  95.83  96.28  94.50  94.04  95.25  96.87  93.41  93.50  97.18  97.04  97.67  95.67  93.73  94.36 
C1  M1  C2  M2  C3  M3  C4  M4  C5  M5  

Kernel  5  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2 
Channels  3/16  16/32  32/64  64/128  128/256  
BEVGGC (102)  FL1  FL2  
Neurons  120  3  
BEVGGC (103)  FL1  FL2  FL3  
Neurons  120  60  3  
BEVGGC (104)  FL1  FL2  FL3  FL4  
Neurons  120  80  40  3 
Confusion matrix  Predicted class (2 layers of FL)  Predicted class (3 layers of FL)  Predicted class (4 layers of FL)  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
Actual class  COV  4502  70  58  4433  144  53  4523  75  32 
Nor  208  4319  103  93  4456  81  141  4383  106  
Pne  84  176  4370  75  72  4483  54  203  4373 
Accuracy  Precision  Sensitivity  Specificity  F_{1} scores  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
102  96.98  95.99  96.97  93.91  94.61  96.45  97.24  93.28  94.38  96.85  97.34  98.26  95.54  93.94  95.40 
103  97.37  97.19  97.98  96.35  95.38  97.10  95.75  96.24  96.83  98.19  97.67  98.55  96.05  95.81  96.96 
104  97.83  96.22  97.16  95.87  94.04  96.94  97.69  94.67  94.45  97.89  97.00  98.51  96.77  94.35  95.68 
C1  M1  C2  M2  C3  M3  C4  M4  C5  M5  C6  M6  

Kernel  5  2  3  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2  1  2 
Channels  3/16  16/32  32/64  64/128  128/256  256/512  
BEVGGC (122)  FL1  FL2  
Neurons  120  3  
BEVGGC (123)  FL1  FL2  FL3  
Neurons  120  60  3  
BEVGGC (124)  FL1  FL2  FL3  FL4  
Neurons  120  80  40  3 
Confusion matrix  Predicted class (2 layers of FL)  Predicted class (3 layers of FL)  Predicted class (4 layers of FL)  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
Actual class  COV  4530  47  53  4488  87  55  4547  32  51 
Nor  198  4341  91  131  4364  135  198  4329  103  
Pne  67  116  4447  34  112  4484  90  109  4431 
As can be seen from
As can be seen from
Accuracy  Precision  Sensitivity  Specificity  F_{1} scores  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
122  97.37  96.75  97.65  94.47  96.38  96.86  97.84  93.76  96.05  97.14  98.24  98.44  96.13  95.05  96.45 
123  97.79  96.65  97.58  96.45  95.64  95.93  96.93  94.25  96.85  98.22  97.85  97.95  96.70  94.94  96.39 
124  97.33  96.82  97.46  94.04  96.85  96.64  98.21  93.50  95.70  96.89  98.48  98.34  96.08  95.14  96.17 
As can be seen from
C1  M1  C2  M2  C3  M3  C4  M4  C5  M5  

Not optimized by BBO  
Kernel  5  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2 
Optimized by BBO  
Kernel  3  2  1  2  3  2  3  2  3  2 
Stride  2  2  1  2  1  2  1  2  1  2 
Confusion matrix  Predicted class (with BBO)  Predicted class (without BBO)  

COV  Nor  Pne  COV  Nor  Pne  
Actual class  COV  4498  98  34  4483  72  75 
Nor  63  4475  92  81  4456  93  
Pne  25  28  4577  53  144  4433 
Accuracy  Precision  Sensitivity  Specificity  F_{1} scores  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
With BBO  98.42  97.98  98.71  98.08  97.26  97.32  97.15  96.65  98.86  99.05  98.64  98.64  97.61  96.96  98.08 
Without BBO  97.98  97.19  97.37  97.10  95.38  96.35  96.83  96.24  95.75  98.55  97.67  98.19  96.96  95.81  96.05 
C1  M1  C2  M2  C3  M3  C4  M4  C5  M5  

Not optimized by BBO  
Kernel  5  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2 
Optimized by BBO  
Kernel  5  2  3  2  1  2  3  2  3  2 
Stride  2  2  2  2  1  2  1  2  1  2 
C1  M1  C2  M2  C3  M3  C4  M4  C5  M5  

Not optimized by BBO  
Kernel  5  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2 
Dilated rate  1  2  4  
Optimized by BBO  
Kernel  3  2  3  2  3  2  3  2  3  2 
Stride  1  2  1  2  1  2  1  2  1  2 
Dilated rate  1  2  4 
Confusion matrix  BEVGGCII predicted class  BEVGGCIII predicted class  

COV  Nor  Pne  COV  Nor  Pne  
Actual class  COV  4352  219  59  4312  213  105 
Nor  206  4294  130  116  4312  202  
Pne  52  102  4476  42  39  4549 
As seen from
Accuracy  Precision  Sensitivity  Specificity  F_{1} scores  

COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  COV  Nor  Pne  
BEVG GCII  96.14  95.27  97.53  94.40  93.04  95.95  94.00  92.74  96.67  97.21  96.53  97.96  94.20  92.89  96.31 
BEVG GCIII  96.57  95.90  97.21  96.57  94.48  93.68  93.13  93.13  98.25  98.29  97.27  96.69  94.77  93.80  95.91 
It can be seen from
It can be seen from
Run  The number of fully connected layer is 2  The number of fully connected layer is 3  The number of fully connected layer is 4 

OA  OA  OA  
1  95.72  94.90  94.40 
2  94.88  93.79  94.53 
3  96.15  93.98  95.10 
4  93.89  94.71  93.69 
5  94.84  94.65  94.59 
6  93.91  95.04  93.83 
7  94.01  95.26  94.69 
8  95.13  94.87  95.61 
9  94.21  94.95  94.97 
10  93.72  94.56  95.07 
Average  94.61 ± 0.54  94.76 ± 0.44  94.65 ± 0.49 
It can be seen from
Run  The number of fully connected layer is 2  The number of fully connected layer is 3  The number of fully connected layer is 4 

OA  OA  OA  
1  95.45  96.04  95.23 
2  94.77  95.10  95.62 
3  95.31  97.48  95.49 
4  94.97  96.76  95.39 
5  94.96  95.46  96.23 
6  95.61  96.62  95.77 
7  95.44  95.90  95.28 
8  94.89  96.62  95.38 
9  94.76  95.97  95.60 
10  94.74  96.76  95.82 
Average  95.09 ± 0.31  96.27 ± 0.67  95.58 ± 0.28 
It can be seen from
Run  The number of fully connected layer is 2  The number of fully connected layer is 3  The number of fully connected layer is 4 

OA  OA  OA  
1  95.26  95.37  95.61 
2  95.21  96.24  96.31 
3  95.88  95.65  95.77 
4  96.43  95.66  95.46 
5  96.60  96.98  96.01 
6  94.59  94.74  95.41 
7  95.93  96.87  96.26 
8  96.27  96.59  94.98 
9  96.18  96.22  95.57 
10  95.66  95.19  96.45 
Average  95.80 ± 0.59  95.95 ± 0.71  95.78 ± 0.44 
It can be seen from
Run  With BBO  Without BBO 

OA  OA  
1  98.10  96.04 
2  96.82  95.10 
3  97.26  97.48 
4  98.04  96.76 
5  98.29  95.46 
6  96.31  96.62 
7  97.40  95.90 
8  98.12  96.62 
9  97.88  95.97 
10  98.31  96.76 
Average  97.65 ± 0.65  96.27 ± 0.67 
In this section we listed the OA of three methods (BEVGGCI, BEVGGCII, and BEVGGCIII). Detailed data is shown in
Run  BEVGGCI  BEVGGCII  BEVGGCIII 

OA  OA  OA  
1  98.10  94.86  94.84 
2  96.82  94.61  95.10 
3  97.26  94.43  95.30 
4  98.04  94.16  94.74 
5  98.29  94.47  94.95 
6  96.31  94.39  93.44 
7  97.40  94.41  95.28 
8  98.12  94.74  94.76 
9  97.88  94.16  95.18 
10  98.31  94.64  94.45 
Average  97.65 ± 0.65  94.49 ± 0.22  94.81 ± 0.52 
It can be seen from
In this section, we analyzed the GradCAM result of our methods, which can help us understand the performance of different methods more intuitively.
In this section, we compared our three methods with two stateoftheart methods (VGG16 [
Approach  OA 

VGG16  88.06 ± 1.69 
ResNet18  90.81 ± 0.72 
BEVGGCI  97.65 ± 0.65 
BEVGGCII  94.49 ± 0.22 
BEVGGCIII  94.81 ± 0.52 
Variable name  Variable meaning 

Probability of the habitat has the species number is 

Maximum probability of species.  
Output size of convolution operation.  
Output size of pooling operation.  
Input size of convolution operation.  
Input size of pooling operation.  
Convolution kernel size.  
Size of conventional convolution kernel.  
Size of convolution kernel in step 1 of DSC.  
Pooling kernel size.  
Number of convolution kernels.  
Number of channels of input images.  
Rate of immigration when the species number is 

Maximum rate of immigration.  
Receptive filed size of dilated convolution.  
Size of dilation rate.  
Padding size.  
Rate of mutation when the species number is 

Maximum rate of mutation.  
Rate of emigration when the species number is 

Maximum rate of emigration.  
Number of parameters in convolution operation.  
Number of parameters in standard convolution operation.  
Number of parameters in DSC operation.  
Number of species.  
Stride size of the convolution kernel.  
Stride size of pooling kernel. 
Abbreviation  Full definition 

BBO  BiogeographyBased Optimization. 
BEVGG  BiogeographyBased Optimization ExpertVGG. 
BEVGGC  BiogeographyBased Optimization ExpertVGG for diagnosis COVID19. 
CNN  Convolutional Neural Network. 
CCT  Chest Computed Tomography. 
DSC  Depthwise Separable Convolution. 
FL  Fully Connected Layer. 
FN  False Negative. 
FP  False Positive. 
GradCAM  Gradientweighted Class Activation Mapping. 
Nor  The chest Xray images of Normal. 
OA  Overall accuracy. 
ReLU  Rectified Linear Unit. 
TN  True Negative. 
TP  True Positive. 
Pne  The chest Xray images of Pneumonia. 
As can be seen from
In this paper, we proposed three methods (BEVGGCI, BEVGGCII, and BEVGGCIII) to diagnose COVID19 based on chest Xray images and used BBO to optimize the values of hyperparameters of the methods. Among them, BEVGGCI (97.65% ± 0.65%) has the best detection performance, followed by BEVGGCIII (94.81% ± 0.52%), and the last is BEVGGCII (94.49% ± 0.22%). The experimental result shows that all of our methods are superior to the stateoftheart methods in the OA of model.
In the next study, we will continue to work on the artificial intelligence medical image aided diagnosis system by using deep learning technology. Our main research directions are: (i) we will try to use more optimization methods [