In the deep learning approach for identifying plant diseases, the high complexity of the network model, the large number of parameters, and great computational effort make it challenging to deploy the model on terminal devices with limited computational resources. In this study, a lightweight method for plant diseases identification that is an improved version of the ShuffleNetV2 model is proposed. In the proposed model, the depthwise convolution in the basic module of ShuffleNetV2 is replaced with mixed depthwise convolution to capture crop pest images with different resolutions; the efficient channel attention module is added into the ShuffleNetV2 model network structure to enhance the channel features; and the ReLU activation function is replaced with the ReLU6 activation function to prevent the generation of large gradients. Experiments are conducted on the public dataset PlantVillage. The results show that the proposed model achieves an accuracy of 99.43%, which is an improvement of 0.6 percentage points compared to the ShuffleNetV2 model. Compared to lightweight network models, such as MobileNetV2, MobileNetV3, EfficientNet, and EfficientNetV2, and classical convolutional neural network models, such as ResNet34, ResNet50, and ResNet101, the proposed model has fewer parameters and higher recognition accuracy, which provides guidance for deploying crop pest identification methods on resource-constrained devices, including mobile terminals.

In agricultural production, plant diseases are a leading cause of crop yield reduction. In actual production, the identification of plant diseases mainly depends on the farmers’ long-term experience. For large agricultural lands with a variety of crops, the identification of plant diseases is time consuming and laborious. Moreover, the identification of plant diseases is time sensitive, has a small detection range, and is not reliable. The use of computer vision to analyze images of crop leaves to identify plant diseases has good application prospects in the agricultural production field. Numerous scholars have attempted to use deep learning methods to identify crop pests and diseases, assist in the prevention and diagnosis of plant diseases, and promote the rapid development of agriculture [

Krishnamoorthy et al. [

The above studies employed classical convolutional neural networks (CNNs) to improve the crop pest and disease identification accuracy. The accuracy of classical CNN models, such as AlexNet [

Based on the above problems, this study improves on ShuffleNetV2, aiming to improve the recognition accuracy of the model while keeping it lightweight. The key contributions of this study are as follows:

The depthwise convolution in the basic module of ShuffleNetV2 is replaced with mixed depthwise convolution (MixDWConv) to capture crop pest images at different resolutions.

The efficient channel attention (ECA) module is added to the ShuffleNetV2 model network structure to enhance the channel features.

The ReLU6 activation function is introduced to prevent the generation of large gradients.

The proposed lightweight CNN is highly suitable for deploying the model on embedded resource-constrained devices, such as mobile terminals, which assists in realizing the accurate identification of plant diseases in real time. Additionally, it has robust engineering utility and high research value.

The remainder of this paper is structured as follows. Section 2 presents the literature review and the baseline model. Section 3 describes the proposed model. Section 4 discusses the experimental results and ablation study. Finally, Section 5 presents the conclusions.

Mohanty et al. [

Sun et al. [^{5} and the average disease recognition accuracy was 99.24%. Guo et al. [

The ShuffleNetV1 network is a high-performance lightweight CNN that was proposed by the Megvii Technology team in 2017. The essential metrics for the neural network architecture design have not only computational complexity [

Layer | Output Size | Kernel size | Stride | Repeat | Output channels |
---|---|---|---|---|---|

Image | 224 × 224 | 3 | |||

Conv1 | 112 × 112 | 3 × 3 | 2 | 1 | 24 |

Max pool | 56 × 56 | 3 × 3 | 2 | 24 | |

Stage2 | 28 × 28 | 2 | 1 | 116 | |

28 × 28 | 1 | 3 | |||

Stage3 | 14 × 14 | 2 | 1 | 232 | |

14 × 14 | 1 | 7 | |||

Stage4 | 7 × 7 | 2 | 1 | 464 | |

7 × 7 | 1 | 3 | |||

Conv5 | 7 × 7 | 1 × 1 | 1 | 1 | 1024 |

Global pool | 1 × 1 | 7 × 7 | |||

FC | 1000 |

The ShuffleNetV2 network includes the Conv1 layer, Max Pool layer, Stage2 layer, Stage3 layer, Stage4 layer, Conv5 layer, Global Pool layer, and FC layer. The Stage2, Stage3, and Stage4 layers comprise stacked basic modules. The Stage2 and Stage4 layers are stacked with four basic modules, and the Stage3 layer is stacked with eight basic modules. The first basic module in each stage has a stride size of 2, which is mainly used for downsampling, and the other basic modules have a stride size of 1.

Depthwise separable convolution [

The multiplication of the standard convolution is computed as

_{k} is the size of the convolution kernel, M is the number of input feature channels, N is the number of output feature channels, and D_{F} is the size of the output feature map.

The number of parameters for the standard convolution is

The multiplication of the depthwise separable convolution is computed as

The number of parameters for the depthwise separable convolution is

The ratio of the multiplication of the depthwise separable convolution to the standard convolution is

The ratio of the number of parameters of the depthwise separable convolution to the standard convolution is

N is the number of channels in the output; thus, it is negligible. D_{k} is the size of the convolution kernel, which is typically set as 3. The depthwise separable convolution is 1/9 times larger than the standard convolution in terms of both computation and number of parameters. Compared to the traditional convolution operation, the depthwise separable convolution reduces the number of parameters and improves the model training speed.

The channel shuffle operation not only facilitates the information exchange among different channels but also reduces the computational effort of the model [

Based on the characteristics of plant diseases, ShuffleNetV2 is selected as the backbone network in this study. Depthwise convolution only uses a single convolution kernel to extract image features, which is not suitable for image recognition in different resolutions, and thus, MixDWConv is used instead of depthwise convolution in the ShuffleNetV2 basic module. To strengthen the channel features, the ECA module is introduced in the ShuffleNetV2 network structure. The ReLU activation function easily yields large gradients in the network training process. Therefore, the ReLU activation function is replaced by the ReLU6 activation function.

The lightweight model ShuffleNetV2 is improved to overcome the problems of the large number of parameters and the high model complexity of the classical CNN. As shown in

In the basic and downsampling modules, the proposed model uses MixDWConv instead of the depthwise convolution of the ShuffleNetV2 model. Furthermore, the ReLU6 activation function is used instead of the ReLU activation function. The MixDWConv, ECA module, and ReLU6 activation function are further elaborated below.

When designing CNNs, one of the most critical and easily overlooked points regarding depthwise convolution is the size of the convolutional kernel. Although traditional depthwise convolution generally employs a convolutional kernel size of 3, recent studies [

Based on MobileNets, Tan et al. [

As stated in Section 2.3, while the 3 × 3 depthwise convolution is used in the ShuffleNetV2 basic module, the proposed model employs MixDWConv. As shown in

The MixDWConv operation involves several variables.

Number of groups g: The number of groups determines how many convolutional kernels of different sizes need to be used for the input tensor. In literature [

Size of convolutional kernels in each group: The size of the convolutional kernels can be arbitrary in theory, but without restriction, the size of convolutional kernels in two groups may be the same, which is equivalent to merging into one group. Therefore, different convolution kernel sizes need to be set for each group. The restricted convolution kernel size is set as 3 × 3 and is monotonically increased by 2 for each group, i.e., the size of the convolution kernel for the i^{th} group is 2i + 1. For example, in this experiment, g = 4 and the convolution kernel size is {3 × 3, 5 × 5, 7 × 7, 9 × 9}. For an arbitrary number of groupings, the convolution kernel size is already determined, which simplifies the design process.

Number of channels in each group: The equal division method is used, i.e., the number of channels is divided into four equal groups, and the number of channels in each group is the same.

The channel attention mechanism can effectively improve the performance of CNNs. Most attention mechanisms can improve the network accuracy, but they increase the computational burden. Wang et al. [

The primary role of the activation function is to provide the network with the ability of nonlinear modeling to address the deficiency of the model representation capability, which has a crucial role in neural networks [

Here,

The experiment was performed using an Intel (R) Core (TM) i7-8700 CPU processor with the Windows 10 operating system, Pytorch 1.7.1 deep learning framework, and PyCharm development platform. During the training process, to ensure scientific and reliable results, in all experiments, the stochastic gradient descent optimizer is used for parameter updation, the loss function is the cross-entropy function, the number of iterations is 30, and the batch size is 64.

The experiments are performed on the publicly available dataset PlantVillage [

By collating the data, the problems of uneven sample distribution and low contrast are identified in the crop pest and disease leaf images. Therefore, Python is used to enhance the sample data with random horizontal/vertical flip and exposure operations. The enhancement effect is shown in

Data category | Original data/sheet | Training set/sheet | Test set/sheet |
---|---|---|---|

Apple health | 329 | 1495 | 404 |

Apple scab | 378 | 1512 | 437 |

Apple black rot | 373 | 1491 | 418 |

Apple rust | 385 | 1510 | 416 |

Corn health | 406 | 1501 | 402 |

Corn gray spot | 411 | 1505 | 408 |

Corn rust | 417 | 1502 | 409 |

Corn leaf blight | 394 | 1506 | 394 |

Grape health | 339 | 1508 | 402 |

Grapes black rot | 413 | 1500 | 436 |

Grape black measles | 431 | 1507 | 424 |

Grape leaf blight | 431 | 1506 | 415 |

Tomato health | 319 | 1503 | 406 |

Tomato spot blight | 426 | 1517 | 401 |

Tomato two spotted spider mite | 336 | 1506 | 405 |

Tomato late blight | 382 | 1528 | 401 |

Tomato leaf mold | 381 | 1502 | 380 |

Tomato bacterial spot | 355 | 1500 | 425 |

Tomato target spot | 281 | 1514 | 487 |

Tomato early blight | 400 | 1500 | 400 |

Tomato mosaic virus | 389 | 1495 | 444 |

Tomato yellow leaf | 379 | 1500 | 400 |

Potato health | 386 | 1464 | 420 |

Potato early blight | 400 | 1500 | 400 |

Potato late blight | 400 | 1500 | 400 |

Total | 9541 | 37572 | 10334 |

Comparison of the accuracy and loss of the proposed model with the ShuffleNetV2 model shows that the proposed model converges faster than the ShuffleNetV2 model (

Models | Accuracy/% | Model Size/MB | Number of participants | Memory accesses/MB |
---|---|---|---|---|

ShuffleNetV2 1.0x | 98.83 | 4.94 | 1279229 | 20.84 |

ShuffleNetV2 1.5x | 98.88 | 9.64 | 2504249 | 29.32 |

ShuffleNetV2 2.0x | 98.78 | 20.71 | 5396221 | 39.50 |

MobileNetV2 | 99.10 | 8.66 | 2236682 | 74.25 |

MobileNetV3 | 99.12 | 5.93 | 1543481 | 16.19 |

Efficient Net | 98.71 | 15.57 | 11194137 | 79.40 |

EfficientNetV2 | 99.32 | 77.71 | 20209513 | 144.97 |

ResNet34 | 98.56 | 81.31 | 21297497 | 37.61 |

ResNet50 | 97.92 | 90.82 | 23559257 | 109.68 |

ResNet101 | 98.76 | 162.73 | 42551383 | 161.75 |

Ours | 99.43 | 5.23 | 1331428 | 20.88 |

The proposed model is compared with models proposed in previous studies [

Method | Basic model | Accuracy/% | Model size/MB |
---|---|---|---|

[ |
VGG | 99.42 | 6.47 |

[ |
ShuffleNetV2 | 99.24 | – |

[ |
AlexNet | 92.7 | 29.9 |

[ |
MobileNet | 95.02 | 17.1 |

[ |
Inception V3 | 95.62 | 87.5 |

Ours | ShuffleNetV2 |

To investigate whether the introduction of the attention module is effective for identifying plant diseases, a comparative experiment is conducted. The original model of ShuffleNetV2 is compared with the ShuffleNetV2 model comprising the channel attention mechanism Squeeze-and-Excitation Networks (SE), the mixed attention module CBAM, and the ECA module.

Models | Accuracy/% | Model size/MB | Number of participants | Memory accesses/MB |
---|---|---|---|---|

ShuffleNetV2 1.0x | 98.83 | 4.94 | 1279229 | 20.84 |

ShuffleNetV2 + SE | 98.86 | 6.05 | 1596146 | 21.08 |

ShuffleNetV2 + CBAM | 98.90 | 5.32 | 1347123 | 20.89 |

ShuffleNetV2 + ECA | 99.01 | 4.94 | 1279229 | 20.84 |

To verify the effectiveness of various optimization methods in the proposed model, various optimization methods are compared with the ShuffleNetV2 1.0x model. The detailed experimental results are shown in

Models | Accuracy/% | Model size/MB | Number of participants | Memory accesses/MB |
---|---|---|---|---|

ShuffleNetV2 1.0x | 98.83 | 4.94 | 1279229 | 20.84 |

ShuffleNetV2 + MixDWConv | 99.23 | 5.23 | 1331428 | 20.88 |

ShuffleNetV2 + ECA | 99.01 | 4.94 | 1279229 | 20.84 |

ShuffleNetV2 + ReLU6 | 99.08 | 4.94 | 1279229 | 20.84 |

ShuffleNetV2 + MixDWConv + ECA + ReLU6 | 99.43 | 5.23 | 1331428 | 20.88 |

The final improved ShuffleNetV2 model incorporates MixDWConv, the ECA mechanism, and the ReLU6 activation function to achieve an optimal result. A 0.6 percentage point improvement in accuracy is achieved compared to ShuffleNetV2 1.0x, while sacrificing a small number of model parameters.

In this study, a lightweight model that is the modified version of ShuffleNetV2 is proposed. It uses MixDWConv in the basic module of ShuffleNetV2, i.e., all channels are divided into groups and different sizes of convolution kernels are applied to different groups. In the proposed model, g = 4 for MixDWConv. This subsection shows how different group sizes in MixDWConv influence the model performance.

To solve the problems of high complexity and large number of parameters in existing models for crop pest recognition, an improved ShuffleNetV2 crop pest recognition model was proposed. The depthwise convolution is replaced by MixDWConv, and several parameters are added to significantly improve the recognition accuracy of the model. The proposed model incorporates the ECA module to improve the model recognition accuracy without increasing the number of model parameters. The ReLU6 activation function is employed to prevent the generation of large gradients. The recognition accuracy of the proposed model on the PlantVillage public dataset is 99.43%, which makes it convenient to deploy on end devices with limited computing resources for subsequent research. Future studies will investigate methods to significantly reduce the number of parameters while maintaining the crop pest and disease recognition accuracy and comprehensively improving the model performance.

We would like to thank our professors for their instruction and our classmates for their help.

This study was supported by the Guangxi Key R&D Project (Gui Ke AB21076021) and the Project of Humanities and social sciences of “cultivation plan for thousands of young and middle-aged backbone teachers in Guangxi Colleges and universities” in 2021: Research on Collaborative integration of logistics service supply chain under high-quality development goals (2021QGRW044).

The authors declare that they have no conflicts of interest to report regarding the present study.