Feature Selection (FS) is a key preprocessing step in pattern recognition and data mining tasks, which can effectively avoid the impact of irrelevant and redundant features on the performance of classification models. In recent years, metaheuristic algorithms have been widely used in FS problems, so a Hybrid Binary Chaotic Salp Swarm Dung Beetle Optimization (HBCSSDBO) algorithm is proposed in this paper to improve the effect of FS. In this hybrid algorithm, the original continuous optimization algorithm is converted into binary form by the Stype transfer function and applied to the FS problem. By combining the K nearest neighbor (KNN) classifier, the comparative experiments for FS are carried out between the proposed method and four advanced metaheuristic algorithms on 16 UCI (University of California, Irvine) datasets. Seven evaluation metrics such as average adaptation, average prediction accuracy, and average running time are chosen to judge and compare the algorithms. The selected dataset is also discussed by categorizing it into three dimensions: high, medium, and low dimensions. Experimental results show that the HBCSSDBO feature selection method has the ability to obtain a good subset of features while maintaining high classification accuracy, shows better optimization performance. In addition, the results of statistical tests confirm the significant validity of the method.
With the development of the information industry, science and technology, and the gradual maturity of modern collection technology, thousands of applications have produced a huge amount of data information, and various data information also provides a guarantee for the development of technology. At the same time, the data analysis and processing are also faced with a big challenge, because the data dimension is too large to get more valuable information. Therefore, dimension reduction is very important in the process of data preprocessing [
FS extracts a significant portion of features from the original dataset and uses predetermined assessment metrics to eliminate characteristics that are unnecessary, redundant, or noisy in order to reduce dimension. In contrast to Feature Extraction, which also performs the function of dimension reduction, Feature Extraction accomplishes this through the creation of new feature combinations, while FS does not need to modify original features, so FS has been widely used in text mining and other fields [
An increasing number of algorithms have been used in the field of FS as a result of the ongoing development of metaheuristic algorithms [
Dung Beetle Optimization (DBO) [
Introducing a unique feature selection approach for dung beetle optimization in hybrid salp swarms. Combining the salp swarm optimization method with the dung beetle optimization algorithm makes good use of the features that make each approach distinct.
Using this hybrid algorithm combined with a KNN classifier greatly improves classification accuracy.
The effectiveness of the hybrid algorithm as applied to the FS problem is explored by comparing it with four betterknown metaheuristics on 16 UCI benchmark test datasets, and by employing a variety of evaluation criteria as well as multiple perspectives from high, medium, and low dimensionality.
The Friedman test was employed to see whether there was a significant difference between the outcomes produced from the comparison approach and the suggested hybrid feature selection method.
Dung Beetle Optimization (DBO) is a new swarm intelligence optimization algorithm inspired by simulating the social behavior of dung beetles in nature. In this algorithm, the original author divided the dung beetle population into four different agents (ballrolling dung beetles, brood balls, small dung beetles, and stealing dung beetles) according to a certain proportion to simulate different dung beetle behaviors. The specific proportion division is shown in
To keep the ball moving in a straight course, dung beetles in the wild must pay attention to astronomical signals. The locations of the beetles are continually shifting as they roll. The mathematical model of dung beetles’ ballrolling is shown in
A dung beetle must dance to figure out its new course when it comes across an obstruction that prevents it from moving ahead. This dancing behavior is described as follows:
Dung beetles roll their balls to a secure spot, conceal them, and then lay their eggs to create a safer habitat for their larvae. In order to replicate the locations where female dung beetles deposit their eggs, the following boundary selection approach is suggested:
The ideal foraging region is first determined to direct tiny dung beetles to forage, and the foraging border is specified as follows in order to mimic the foraging behavior of these insects:
Certain dung beetles in the wild take dung balls from other dung beetles. The following is an updated position of the thief:
In 2017, Mirjalili et al. introduced a novel intelligent optimization technique called the Salp Swarm Algorithm (SSA) [
The FS problem is a binary problem with an upper bound of 1 and a lower bound of 0. Obviously, the continuous DBO cannot deal with this problem, so we convert the continuous algorithm into binary form through
To address the imbalance between exploration and exploitation and the tendency for local optima to easily stagnate in the current DBO, this paper adopts the following strategies to improve the original DBO.
Chaos is an aperiodic phenomenon with asymptotic selfsimilarity and orderliness. Due to its unique randomness, ergodicity, and complexity, it is widely used as a global optimization processing mechanism to effectively avoid the dilemma of local optima in the process of data search optimization in the field of decision system design [
In order to solve problems such as low population diversity and unsatisfactory search results in the random initialization stage of DBO, the Bernoulli chaotic mapping method is introduced. It makes the dung beetle populations traverse solution space better in the initialization phase, and improves the overall optimal search performance. The histogram of the Bernoulli chaotic mapping sequence is depicted in
The ballrolling dung beetle plays an important role in the DBO, and its updated position has a crucial impact on the global exploration of the algorithm in the solution space. Moreover, SSA has a unique leader and follower mechanism, strong global exploration ability, and a simple structure to implement. Thus, in order to enhance the algorithm’s capacity for global exploration, we include SSA into DBO to update the location of the ballrolling dung beetle. Simultaneously, the original SSA is improved upon and the Levy flight strategy is introduced to the leader position update in order to disrupt the ideal position, hence facilitating a larger exploration of the solution space by the algorithm. Levy flight strategy [
The aforementioned variables have been explained above, where the calculation formula of Levy flight operator is shown as follows:
The behavior of the stolen dung beetle is precisely around the dung ball (global optimal location), which aims to address the issue that DBO is prone to be trapped in local optima. As a result, it plays a significant role in the capacity of DBO to exploit situations locally. In order to prevent the stealing dung beetle from stopping forward search due to finding the local optimal position, a mutation operator is added to the position update of the stealing dung beetle. After each stealing dung beetle position update is completed, mutation operation is carried out with a certain probability
By enhancing the DBO using the aforementioned tactics, the algorithm’s capacity to break out of local optima is effectively increased, its scope for global exploration is broadened, and a healthy balance between its capabilities for local exploitation and global exploration is struck. The flow chart applied to FS specifically is shown in
Import the data and divide it into training and test sets. Determine the fitness function and the parameter initialization settings.
Initialization of chaotic mapping for dung beetle populations according to
Using an enhanced SSA to update a rolling dung beetle’s location within a DBO.
The
The characteristics that are iteratively chosen are supplied into the KNN classifier for training after the dung beetle population is binarized based on the transfer function.
Calculate the fitness value corresponding to the selected feature, if
Output results.
Choosing the lowest feature subset from the original features and increasing classification accuracy are the two goals of the multiobjective optimization problem known as the FS problem. Based on the foregoing, the fitness function of the FS issue is determined to be as follows in order to attain a balance between the two objectives [
The performance of the HBCSSDBO method for FS is verified in this study using 16 standard datasets that are all taken from the UCI Machine Learning Repository [
No.  Dataset  No. of instances  No. of features  Dimension 

1  Breastcancer  699  9  Low 
2  BreastEW  569  30  High 
3  CongressEW  435  16  Medium 
4  Exactly  1000  13  Low 
5  Exactly2  1000  13  Low 
6  HeartEW  270  13  Low 
7  IonosphereEW  351  34  High 
8  KrvskpEW  3196  36  High 
9  Mofn  1000  13  Low 
10  SonarEW  208  60  High 
11  SpectEW  267  22  Medium 
12  Tictactoe  958  9  Low 
13  Vote  300  16  Medium 
14  WaveformEW  5000  40  High 
15  WineEW  178  13  Low 
16  Zoo  101  16  Medium 
One supervised learning technique that classifies new instances based on how far they are from the training set is the K nearest neighbor (KNN) approach [
Intel(R) CoreTMi76700 machine with 3.4 GHz CPU (Central Processing Unit) and 8 GB of RAM (random access memory) was used to run the HBCSSDBO algorithm and other comparative algorithms. The comparison algorithms include the original DBO [
Methods  Parameters  Values 

HBCSSDBO  [0, 1], [0, 1], 0.5  
DBO  0.1, 0.3, 0.5  
SSA  [0, 1], [0, 1]  
BGWOPSO  0.5  
GSA  100 
To validate the suggested approach, a variety of evaluation indices are used in this study, including average classification accuracy (in %), mean fitness value, best and worst fitness values, fitness value variation, average number of picked features, and average computational time. The following formula is used to determine the index:
Average classification accuracy
Mean fitness value
The best fitness value
The worst fitness value
Fitness value variance
Average number of selected features
Average computational time
The outcomes of using the HBCSSDBO algorithm on 16 datasets are displayed in
No.  Dataset  Average accuracy (%)  Mean fitness  Feature selected  Computational time (s) 

1  Breastcancer  98.92  0.02  4.20  15.43 
2  BreastEW  96.90  0.03  3.00  15.69 
3  CongressEW  98.39  0.02  5.30  14.60 
4  Exactly  100.00  0.00  6.00  16.03 
5  Exactly2  78.65  0.22  6.90  16.17 
6  HeartEW  90.56  0.10  4.00  14.42 
7  IonosphereEW  96.71  0.03  4.50  15.04 
8  KrvskpEW  98.69  0.02  13.90  32.05 
9  Mofn  100.00  0.00  6.00  16.37 
10  SonarEW  97.56  0.03  8.10  13.76 
11  SpectEW  89.81  0.10  6.10  13.78 
12  Tictactoe  84.82  0.16  7.00  15.85 
13  Vote  97.83  0.02  4.00  14.04 
14  WaveformEW  86.02  0.14  17.60  58.34 
15  WineEW  99.43  0.01  3.30  13.73 
16  Zoo  99.50  0.01  5.20  14.06 
Average  94.61  0.06  6.57  18.71 
The original DBO and the original SSA are compared, respectively, to confirm the efficacy of fusion, and this process is done to validate the hybrid algorithm’s usefulness. To confirm the algorithm’s improved performance, the hybrid Gray Wolf Particle Swarm Optimization method and the Gravity Search algorithm are chosen for comparison. The running results of each algorithm are shown in
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  97.99  98.27  98.27  98.42  
BreastEW  96.55  96.02  95.13  95.66  
CongressEW  97.24  97.93  98.05  98.16  
Exactly  97.15  96.95  
Exactly2  77.55  77.85  78.00  78.15  
HeartEW  87.41  88.70  87.41  87.41  
IonosphereEW  96.29  94.71  94.43  93.43  
KrvskpEW  98.34  98.15  98.64  98.50  
Mofn  
SonarEW  96.59  96.59  95.85  95.61  
SpectEW  88.30  88.68  88.87  89.43  
Tictactoe  84.82  84.14  83.82  83.66  
Vote  97.83  97.50  97.83  98.00  
WaveformEW  85.34  84.81  85.58  85.99  
WineEW  98.00  97.43  98.00  98.86  
Zoo  96.50  98.50  98.00  98.00  
Average  93.43  93.52  93.73  93.73 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  4.20  3.8  4.30  4.60  
BreastEW  3.50  3.60  4.20  8.70  
CongressEW  5.30  4.80  5.50  5.10  
Exactly  6.20  6.10  
Exactly2  6.90  6.50  6.50  6.20  
HeartEW  4.40  4.60  4.70  
IonosphereEW  8.10  10.50  10.20  
KrvskpEW  15.60  18.40  21.00  20.80  
Mofn  
SonarEW  9.00  16.70  20.80  26.00  
SpectEW  7.50  9.10  8.80  9.60  
Tictactoe  7.00  7.10  7.90  7.10  
Vote  4.00  5.20  5.60  5.50  
WaveformEW  22.20  22.20  27.80  23.50  
WineEW  3.60  4.30  4.20  3.80  
Zoo  5.80  6.30  6.20  6.60  
Average  7.03  8.19  9.37  9.65 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  2.383E02  2.132E02  2.187E02  2.078E02  
BreastEW  3.533E02  4.062E02  4.959E02  4.583E02  
CongressEW  3.031E02  2.330E02  2.278E02  2.139E02  
Exactly  3.298E02  3.489E02  
Exactly2  2.268E01  2.243E01  2.228E01  2.211E01  
HeartEW  1.281E01  1.152E01  1.282E01  1.283E01  
IonosphereEW  3.809E02  5.471E02  5.825E02  6.806E02  
KrvskpEW  2.076E02  2.339E02  1.931E02  2.065E02  
Mofn  
SonarEW  3.530E02  3.659E02  4.452E02  4.780E02  
SpectEW  1.192E01  1.162E01  1.142E01  1.090E01  
Tictactoe  1.581E01  1.649E01  1.672E01  1.696E01  
Vote  2.395E02  2.700E02  2.470E02  2.330E02  
WaveformEW  1.507E01  1.559E01  1.497E01  1.446E01  
WineEW  2.257E02  2.876E02  2.303E02  1.424E02  
Zoo  3.828E02  1.879E02  2.368E02  2.393E02  
Average  6.867E02  6.816E02  6.646E02  6.652E02 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  1.157E02  5.556E03  
BreastEW  9.761E03  1.852E02  1.819E02  2.052E02  
CongressEW  1.875E03  1.325E02  2.500E03  
Exactly  
Exactly2  2.106E01  2.175E01  2.099E01  2.133E01  
HeartEW  5.885E02  9.397E02  5.962E02  
IonosphereEW  1.532E02  1.650E02  3.123E02  3.123E02  
KrvskpEW  1.191E02  1.219E02  1.358E02  1.696E02  
Mofn  
SonarEW  1.167E03  2.500E03  3.667E03  3.667E03  
SpectEW  7.790E02  7.926E02  7.972E02  6.058E02  
Tictactoe  1.374E01  1.344E01  1.188E01  1.240E01  
Vote  2.500E03  2.500E03  2.500E03  2.500E03  
WaveformEW  1.414E01  1.443E01  1.350E01  1.320E01  
WineEW  2.308E03  2.308E03  
Zoo  3.750E03  
Average  4.349E02  4.249E02  4.625E02  4.197E02 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  4.006E02  3.182E02  4.940E02  4.940E02  
BreastEW  6.233E02  7.985E02  7.985E02  7.342E02  
CongressEW  4.864E02  5.877E02  6.190E02  5.052E02  
Exactly  2.883E01  2.917E01  
Exactly2  2.433E01  2.365E01  2.433E01  
HeartEW  1.704E01  1.688E01  1.887E01  1.872E01  
IonosphereEW  1.014E01  1.019E01  1.164E01  
KrvskpEW  3.289E02  4.036E02  2.709E02  2.935E02  
Mofn  
SonarEW  9.775E02  7.527E02  7.694E02  1.013E01  
SpectEW  1.704E01  1.882E01  1.531E01  1.535E01  
Tictactoe  2.036E01  1.922E01  1.914E01  1.922E01  
Vote  6.788E02  5.200E02  5.388E02  7.038E02  
WaveformEW  1.793E01  1.686E01  1.602E01  1.619E01  
WineEW  5.888E02  6.042E02  5.965E02  5.811E02  
Zoo  1.021E01  1.015E01  5.388E02  5.450E02  
Average 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  1.113E02  8.035E03  1.360E02  1.239E02  
BreastEW  1.669E02  1.945E02  1.659E02  1.888E02  
CongressEW  1.655E02  1.721E02  1.850E02  1.476E02  
Exactly  8.971E02  9.037E02  
Exactly2  7.619E03  1.217E02  9.600E03  6.656E03  
HeartEW  3.207E02  3.179E02  3.683E02  4.328E02  
IonosphereEW  1.898E02  2.490E02  1.928E02  2.415E02  
KrvskpEW  3.799E03  6.492E03  8.353E03  5.216E03  
Mofn  
SonarEW  2.532E02  3.454E02  3.055E02  3.569E02  
SpectEW  3.416E02  3.109E02  2.496E02  3.716E02  
Tictactoe  2.308E02  2.116E02  2.295E02  2.243E02  
Vote  1.885E02  1.811E02  1.889E02  2.682E02  
WaveformEW  9.910E03  1.107E02  8.185E03  9.770E03  
WineEW  2.298E02  2.092E02  1.919E02  1.928E02  
Zoo  3.274E02  3.291E02  2.559E02  2.565E02  
Average  2.304E02  2.278E02  1.620E02  1.779E02 
Dataset  HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA 

Breastcancer  15.43  15.10  16.78  17.48  
BreastEW  15.69  16.14  15.67  15.10  
CongressEW  14.60  15.37  15.09  14.60  
Exactly  16.45  17.75  16.88  16.39  
Exactly2  16.17  16.97  17.05  16.35  
HeartEW  14.42  14.45  14.49  14.66  
IonosphereEW  15.04  15.90  15.33  14.57  
KrvskpEW  34.24  37.72  38.94  33.91  
Mofn  16.37  16.26  17.22  16.88  
SonarEW  13.76  14.35  14.80  15.17  
SpectEW  14.37  15.48  14.90  15.35  
Tictactoe  15.85  16.70  16.64  17.75  
Vote  14.38  14.51  14.69  14.48  
WaveformEW  69.26  66.25  88.58  66.79  
WineEW  14.34  14.41  14.35  14.02  
Zoo  14.06  14.70  14.53  14.50  
Average 
As shown in
When comparing the average fitness, the best and worst fitness, and the variance of different algorithms, the results are displayed using scientific counting, and the optimal results are bolded in order to prevent data differences from being too minor and challenging to see.
The suggested technique performs better than other comparable algorithms in each of the aforementioned seven assessment indices. When compared to the BDBO feature selection approach, the HBCSSDBO feature selection method improves the average prediction accuracy and average number of chosen features on the entire dataset by 1.18% and 7.2%, respectively. It is evident from the aforementioned study of the average classification accuracy and the average number of features chosen that the suggested approach is capable of selecting the best subset of characteristics while also guaranteeing a high classification accuracy. The suggested method has significantly improved in terms of depth of optimization search and exploration ability as well as robustness, according to the examination of the variance, the best fitness value, and the optimal fitness. These results justify the fusion algorithm, i.e., the embedded SSA algorithm leader and follower mechanisms effectively help the dung beetle individuals better explore the solution space, while the mutation operator and chaotic initialization mapping population ensure that the algorithm does not easily fall into the local optimum and guarantee the diversity of the population, respectively. Furthermore, the hybrid algorithm is guaranteed to be efficient because the suggested approach does not result in any extra time overhead when compared to the two original methods in terms of running time.
The datasets are split into three datasets of different dimensions based on the varying number of features, namely high dimensional datasets, medium dimensional datasets, and low dimensional datasets, in order to further assess the effectiveness of the suggested approach. Specific corresponding information is given in
To sum up, HBCSSDBO can obtain higher classification accuracy when applied to FS, can extract feature subsets more effectively from original datasets, especially in medium and high dimensional datasets, and does not perform poorly on low dimensional datasets. Overall, it is still ahead of the comparative algorithms.
To assess the statistical significance of the earlier obtained results, the experimental data were subjected to a Friedman test [
HBCSSDBO  BDBO  BSSA  BGWOPSO  BGSA  

Breastcancer  1  5  3  4  2 
BreastEW  1  2  3  5  4 
CongressEW  1  5  4  3  2 
Exactly  1  4  5  2  3 
Exactly2  1  5  4  3  2 
HeartEW  1  3  2  4  5 
IonosphereEW  1  2  3  4  5 
KrvskpEW  1  4  5  2  3 
Mofn  1  2  3  4  5 
SonarEW  1  2  3  4  5 
SpectEW  1  5  4  3  2 
Tictactoe  4  1  2  3  5 
Vote  5  4  1  3  2 
WaveformEW  1  5  4  2  3 
WineEW  1  5  2  4  3 
Zoo  1  3  4  5  2 
Sum  23  57  52  55  53 
Average  1.375  3.75  3.6875  3.125  3.0625 
F (t)  25.57  
3.87E05  
Critical value  9.487729 
A pairwise BenjaminHochberg post hoc test with adjusted p value using HBCSSDBO as a control algorithm was significant for HBCSSDBO
Algorithm  Rank  adj 


BDBO  0.0001075  1  0.0125 
BSSA  0.0001075  2  0.025 
BGSA  0.0013406  3  0.0375 
BGWOPSO  0.0075263  4  0.05 
In this paper, HBCSSDBO, a new hybrid algorithm, is proposed to solve the feature selection problem. The method combines the respective features of DBO and SSA, realizes the organic integration of the two algorithms, and effectively solves the feature selection challenge. The average fitness value, average classification accuracy, average number of selected features, variance of fitness value, average running time, and optimal and worst fitness values are the sevenevaluation metrics that are used to assess the performance and stability of the proposed method on 16 UCI datasets. In addition, HBCSSDBO is compared with four metaheuristic feature selection methods, which include BDBO, BSSA, BGWOPSO, and BGSA. Based on several assessment criteria, the trial findings indicate that HBCSSDBO is the best option. Its average classification accuracy is 94.61%, and it has an average of 6.57 chosen features. In the analysis of the method in terms of the optimal and worst fitness values as well as the variance of the fitness values, we can see that the hybrid method has a great improvement in both the optimization searching accuracy and robustness. In addition, the hybrid method does not incur more time loss in terms of running time. These results show that HBCSSDBO is more competitive in solving the feature selection problem. The final statistical test verifies the significant effectiveness of the method.
In our future research, we will explore the application of the proposed method to the feature selection problem in different domains such as data mining, medical applications, engineering applications, and so on. We will try to combine machine learning algorithms other than KNN to study the performance of the HBCSSDBO method.
We thank the Liaoning Provincial Department of Education and Shenyang Ligong University for financial support of this paper.
This research was funded by the ShortTerm Electrical Load Forecasting Based on Feature Selection and optimized LSTM with DBO which is the Fundamental Scientific Research Project of Liaoning Provincial Department of Education (JYTMS20230189), and the Application of Hybrid Grey Wolf Algorithm in Job Shop Scheduling Problem of the Research Support Plan for Introducing HighLevel Talents to Shenyang Ligong University (No. 1010147001131).
The authors confirm contribution to the paper as follows: study conception and design: Wei Liu; data collection: Wei Liu; analysis and interpretation of results: Tengteng Ren; draft manuscript preparation: Tengteng Ren. All authors reviewed the results and approved the final version of the manuscript.
Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.
This article does not contain any research involving humans or animals.
The authors declare that they have no conflicts of interest to report regarding the present study.