The process of selecting features or reducing dimensionality can be viewed as a multi-objective minimization problem in which both the number of features and error rate must be minimized. While it is a multi-objective problem, current methods tend to treat feature selection as a single-objective optimization task. This paper presents enhanced multi-objective grey wolf optimizer with Lévy flight and mutation phase (LMuMOGWO) for tackling feature selection problems. The proposed approach integrates two effective operators into the existing Multi-objective Grey Wolf optimizer (MOGWO): a Lévy flight and a mutation operator. The Lévy flight, a type of random walk with jump size determined by the Lévy distribution, enhances the global search capability of MOGWO, with the objective of maximizing classification accuracy while minimizing the number of selected features. The mutation operator is integrated to add more informative features that can assist in enhancing classification accuracy. As feature selection is a binary problem, the continuous search space is converted into a binary space using the sigmoid function. To evaluate the classification performance of the selected feature subset, the proposed approach employs a wrapper-based Artificial Neural Network (ANN). The effectiveness of the LMuMOGWO is validated on 12 conventional UCI benchmark datasets and compared with two existing variants of MOGWO, BMOGWO-S (based sigmoid), BMOGWO-V (based tanh) as well as Non-dominated Sorting Genetic Algorithm II (NSGA-II) and Multi-objective Particle Swarm Optimization (BMOPSO). The results demonstrate that the proposed LMuMOGWO approach is capable of successfully evolving and improving a set of randomly generated solutions for a given optimization problem. Moreover, the proposed approach outperforms existing approaches in most cases in terms of classification error rate, feature reduction, and computational cost.

Rapid advances in technology nowadays have exponentially increased the dimension size of data. As a result, effective and efficient management of this data becomes more sophisticated. According to [

The KDD method includes a crucial step called data preprocessing, which is a preparatory phase that plays a significant role in obtaining valuable information from the data. Careful execution of this step is essential because any mistakes can make it difficult to extract useful insights. One aspect of data preprocessing is feature selection, which involves using various strategies to remove irrelevant and redundant attributes from a given dataset [

Broadly speaking, data mining utilizes techniques derived from different domains of knowledge, such as statistics and probability, especially machine learning. The three main categories of machine learning are supervised learning, including classification. Unsupervised learning, including clustering and reinforcement learning. Redundant or irrelevant features can degrade a model’s performance [

However, feature selection is a sophisticated process because of the complicated interactions between features. When interacting with other features, related features may become redundant/unrelated. Therefore, an ideal subset of features should be a collection of complementary features extending over the groups’ various characteristics to differentiate them better. Moreover, the difficulty can be extended due to the vast search space. The growth of search space is exponential over the number of original features [

Generally, feature selection treated as a single objective that optimizes classification accuracy or integrates classification accuracy and the selected number of features into a single-objective problem. In fact, feature selection is a multi-objective problem with two main objectives: maximizing classification accuracy and minimizing the number of features [

The main aim of this research is to obtain an accurate estimation of the Pareto optimal solution, which involves an optimal trade-off between classification error rate and feature reduction. An enhanced multi-objective GWO based on Lévy flight and mutation phase (LMuMOGWO) for feature selection is proposed to achieve this aim. The proposed approach is compared with two MOGWO variants and two major benchmark multi-objective methods on 12 commonly used datasets with different classes, instances, and features. The main contributions of this work are summarized as follows:

We proposed an enhanced multi-objective grey wolf optimizer incorporating Lévy flight and mutation phase (LMuMOGWO) for feature selection problems. The proposed algorithm demonstrates excellent search ability in most cases compared to the existing approaches.

The mutation phase was incorporated to reduce the long jump of Lévy flight to obtain a better Pareto optimal solution (classification error rate and the number of features).

An extensive investigation was conducted of the proposed approach’s performance on obtaining Pareto optimal solutions (classification error rate and several features) in comparisons with two MOGWO variants and with two prominent multi-objective algorithms: MOPSO and NSGA-II.

We have tested and evaluated the proposed approach’s classification and computational cost performance on twelve well-known datasets and benchmarked the performance with the existing methods.

The remainder of this paper is organized as follows. The state-of-the-art feature selection methods are presented in Section 2. Section 3 discusses the original MOGWO, and the Binary version BMOGWO-S. The developed enhanced multi-objective algorithm is explained in detail in Section 4. Section 5 provides the experimental setup, results, and discussion. The last section of this work presents the conclusion and provides research directions for future work.

Feature selection approaches can be grouped into filter-based and wrapper-based [

Researchers used numerous wrapper methods to tackle the problem of feature selection. The first exhaustive/complete search is applied in a few numbers of studies [

Nonetheless, identifying suitable values between

Metaheuristic algorithms have been extensively utilized for feature selection to tackle the limitations of the conventional feature selection schemes. In [

Considering microarray data, the work in [

Recently, GWO has been extensively used to solve feature selection problems in diverse areas, including benchmarks problems as in [

Several multi-objective optimizations have been utilized for feature selection problems. In [

This section presents an overview the original MOGWO mathematical model and the original Binary version BMOGWO-S.

The GWO mathematical model is developed by using the alpha (α), beta (β), and delta (δ) wolves that represent the best, second best, and third best solutions, respectively, while the remaining wolves are represented by omega (ω).

Encircling the Prey

GWO developed a set of equations to mathematically model the encircling Prey behaviors, which are given based on the following:

where

Hunting Prey

A grey wolf can locate the position of prey and encircle it. Generally, alpha is the leader that guides the pack for hunting, and it is occasionally assisted by beta and delta to perform this task. Nevertheless, the optimal location of prey is unknown in a search space. To mathematically model the hunting behavior of grey wolves, it is assumed that alpha, beta, and delta have more knowledge that helps to locate the potential prey. Consequently, the best three solutions are recorded, and the remaining candidate solutions represented by (omegas) are forced to update their positions based on the positions of the best three search agents. This is mathematically modeled as follows:

where

Attacking Prey

Once prey cannot move, grey wolves begin to attack it. The hunt of grey wolves finishes by attacking the prey now it is unable to move. Approaching the prey is mathematically modeled by decreasing

where

GWO was initially proposed for single-objective optimization problems. Expressly, the original GWO approach is not capable of solving multi-objective problems. As a result, Mirjalili et al. developed a multi-objective GWO (MOGWO) [

An archive that can store non-dominated Pareto front solutions.

A selection strategy that utilizes the archive to choose the best leader (alpha), the second-best leader (beta), and the third-best leader solutions (delta).

The archive is empowered by a controller who decides which solutions should be saved in the archive. In addition, it has a full awareness in case the archive becomes full. The new non-dominated solutions are compared with the historical ones iteratively. This can lead to numerous situations that can be considered follows:

The new solution is not allowed to enter the archive if it is dominated by the previously stored solutions and vice versa.

If both the new and previously stored solutions do not dominate each other, then the new solution is added to the archive.

In case the archive is full, a grid mechanism could omit one of the cur-rent archive solutions to allow the newly obtained solution to be stored in case it dominates the existing ones.

MOGWO has shown a better convergence speed; nonetheless, it still suffers from being trapped in local optima as well as poor stability behavior. The main reasons for such performance can be summarized as follows:

MOGWO shows positive randomness at the initialization stage only where the positions of wolves are randomly initialized. Although MOGWO has a random factor, leader wolves have a more significant influence on updating the positions of wolves at the iterative process compared with the random factor. Therefore, the MOGWO algorithm relies heavily on the initial values and lacks self-regulation ability.

MOGWO chooses the best three leaders, i.e., α, β, and δ from the archive even if the archive set has fallen into a local optimum. Thus, it is essential to enhance the exploration ability of MOGWO, particularly at the latter stages of the optimization process.

Since lead wolves strongly attract wolves, they blindly follow them, and they also follow non-dominated solutions that are located around leaders. This causes grey wolves to ignore the non-dominated solutions that are placed beside them. In fact, grey wolves might pass over other non-dominated solutions while they follow the leading wolves. The optimization ability of MOGWO can be significantly enhanced if grey wolves are supported with independent searching. In the Pareto front, it is hard to compare solutions. To tackle this problem, a leader selection technique is proposed. GWO has three leaders known as alpha (the best solution), beta (the second-best solution) and delta (the third-best solution) wolves. These three solutions guide the remaining wolves toward the optimal solution. The leader selection strategy chooses the smallest crowded segment of the space and offers one of its non-dominant solutions.

Utilizing the likelihood of a hypercube, the roulette-wheel method is applied as a selection approach as follows:

where ^{th} segment.

The conventional MOGWO was initially developed to solve optimization problems in continuous search space. MOGWO cannot be directly applied to address binary problems, including multi-objective feature selection optimization problems. As a result, a binary MOGWO version was developed in our pervious study by utilizing the sigmoid function [

where

where

where

The previous BMOGWO-S [

A possible solution to the aforementioned problems is to integrate Lévy flight search patterns into the search mechanism of BMOGWO-S, which can enhance the BMOGWO-S in terms of deeper search patterns. This integration can significantly assist BMOGWO-S in achieving more effective global searches. Thus, the issue of stagnation problem will also be remedied. Additionally, Lévy’s integrated BMOGWO-S should enhance the quality of the candidate solutions throughout the simulation process.

For calculating the random walk step length, the Lévy distribution is utilized, which usually begins from one of the best-established positions, then the Lévy flight produces a new generation at a distance dependent on the Lévy distribution that selects the most optimal generation. Two basic steps are included in the Lévy flight: selecting a random direction and generating a new step.

Hence, the power-law is utilized to balance the Lévy distribution step size based on the following equation [

where 0 < β < 2 denotes Lévy index for stability control, and

where γ > 0 is a scale distribution parameter while μ is a location parameter.

Generally, the Fourier transform is used to define Lévy distribution [

where α is a skewness factor that can have a value in the range [−1, 1], and β ∈ (0, 2) denotes the Lévy index.

Mantegna’s algorithm is utilized to estimate the step length of a random walk based on the following equation [

where μ and ν follow a normal distribution.

where

For calculating the step size, the following mathematical formulation is utilized:

Generally, the main steps of the proposed Lévy integrating BMOGWO-S are like the steps of BMOGWO-S. Here, the Lévy distribution is integrated to generate long steps to the optimal solutions. Therefore, the alpha, best and delta positions of the BMOGWO-S are updated as follows:

The Lévy flight helps to address the issue of stagnation in local optimum; however, the motions of Lévy and its tendency to migrate for long repositioning events may cause the possibility to jump across an important/relevant feature without hitting/selecting it so-called (leap-overs). Therefore, to prevent this problem a mutation operator is integrated to provide more informative features that can decrease classification errors. This operator utilizes a nonlinear function, p_{m}, to control the mutation range and the mutation probability of wolves. Iteratively, p_{m} can be updated as follows:

where _{u} is the mutation rate. Algorithm 1 provides the pseudocode of the proposed mutation. Moreover, the representation of a solution is in the form of a one-dimensional binary vector where its length equals the number of features. Each dimension of this vector can have a value of either 0 (feature not selected) or 1 (feature selected).

This work evaluates the selected subsets of features based on the following two main objectives:

Minimization of features number.

Minimization of classification error rate.

Hence, the multi-objective feature selection minimization problem is mathematically formulated as denoted by

where _{1} (x)_{2} (x)

This section presents the experimental setup followed by the results of the proposed algorithm against the existing methods. First, the results of the proposed LMuMOGWO have clearly described then the comparison between the proposed approach and the existing methods are explained.

To investigate the performance of the proposed two multi-objective GWO based feature selection algorithms namely: multi-objective grey wolf optimization based sigmoid transfer function BMOGWO-S [

Dataset | #Features | #Samples | #Classes | Domain |
---|---|---|---|---|

Breastcancer | 9 | 699 | 2 | Biology |

WineEW | 13 | 178 | 3 | Chemistry |

HeartEW | 13 | 270 | 2 | Biology |

Zoo | 16 | 101 | 7 | Artificial |

Lymphography | 18 | 184 | 4 | Biology |

SpectEW | 22 | 267 | 2 | Biology |

BreastEW | 30 | 569 | 2 | Biology |

Ionosphere | 34 | 351 | 2 | Electromagnetic |

KrvskpEW | 36 | 3196 | 2 | Game |

WaveformEW | 40 | 5000 | 3 | Physics |

SonarEW | 60 | 208 | 2 | Biology |

PenglungEW | 325 | 73 | 7 | Biology |

For experiments, instances are split randomly into a training set (70% of the dataset), and the remaining 30% are treated as the validation and test set. In the training phase, a feature subset is represented by one potential solution. Due to the stochastic behavior, all algorithms have been executed 20 times with random seeds. Artificial Neural Network (ANN) is applied in the inner loop of the training process to evaluate the classification accuracy of the selected feature subsets. Once the training process is completed, the test set is utilized to evaluate the selected features by calculating the testing classification accuracy. All algorithms are implemented using MATLAB R2017a on an Intel(R) Core™ i7-6700 machine with 16 GB of RAM and a 3.4 GHz CPU. The parameters of the proposed algorithm and the benchmark approaches are presented in

Parameters | LMuMOGWO | BMOGWO-S | BMOGWO-V | BMOPSO | NSGA-II |
---|---|---|---|---|---|

No of iterations | 100 | 100 | 100 | 100 | 100 |

No of grey wolves/population size | 8 | 8 | 8 | 8 | 8 |

Archive/repository size | 50 | 50 | 50 | 50 | 50 |

Alpha | alpha = 0.1 | alpha = 0.1 | alpha = 0.1 | alpha = 0.1 | - |

nGrid | nGrid = 10 | nGrid = 10 | nGrid = 10 | nGrid = 10 | - |

Beta | beta = 4 | beta = 4 | beta = 4 | beta = 4 | - |

Gamma | gamma = 2; | gamma = 2; | gamma = 2; | gamma = 2; | - |

Mutation rate | mu = 0.1 | - | - | mu = 0.1; | mu = 0.1; |

Inertia weight | - | - | - | w = 0.5; | - |

Inertia weight damping rate | - | - | - | wdamp = 0.99; | - |

c1 | - | - | - | c1 = 1; | - |

c2 | - | - | - | c2 = 2; | - |

No of (offsprings) | - | - | - | - | 2*round (pCrossover*nPop/2); |

Mutation percentage | - | - | - | - | pMutation = 0.4; |

No of mutants | - | - | - | - | nMutation=round (pMutation*nPop); |

Sigma | (gamma (1 + beta) * sin (pi * beta/2)/ |

The experimental results of LMuMOGWO for twelve datasets are presented in

Taking the Breast Cancer dataset as an example, LMuMOGWO produces three non-dominated solutions which select approximately 33% from the original features (3 from 9) in which two subsets attain lower classification error rates compared with the usage of all original features. For the HeartEW dataset, LMuMOGWO produces four non-dominated solutions which select approximately 25% of the original features (4 out of 13), and three of the subsets achieve better error rate classification compared with the usage of all original features. Considering the WineEW dataset, four non-dominated solutions are produced (4 from 13) features in which three feature subsets achieved better classification error rates. In the Zoo dataset, only three non-dominated feature subsets are produced from original features (16 features), in which two solutions accomplished better classification error rates compared with the original features. Similarly, in the Lymphography dataset, three non-dominated feature subsets are produced from original features (18 features) in which all solutions attained better classification accuracy compared to utilizing original features. The aforementioned datasets are small-size datasets where in most of these datasets, the number of features is reduced to 30% or less yet attaining higher classification accuracy compared with the usage of the original features.

This subsection investigates the performance of the proposed LMuMOGWO on the twelve datasets against four benchmarking methods. Two are based on MOGWO, 1) BMOGWO-S and 2) BMOGWO-V. In addition, two effectives, widely used multi-objective approaches, namely, BMOPSO and NSGA-II, are used for the comparison.

First, the comparison is carried out on twelve datasets, as illustrated in

Similarly, LMuMOGWO achieves less error rate compared with the usage of all features. Besides, the error rate achieved by LMuMOGWO outperforms the ones achieved by the benchmarking methods. The HeartEW dataset clearly demonstrated better performance of LMuMOGWO against its counterparts’ approaches in terms of feature reduction and lower error rate. LMuMOGWO selects four features, whereas BMOGWO-S, BMOGWO-V, BMOPSO, and NSGA-II select 5, 5, 6, and 7, respectively. This means LMuMOGWO can further decrease the number of selected features while preserving or achieving a lower classification error rate compared with using all features, as well as better than the counterparts’ methods. In the WineEW dataset, LMuMOGWO produced non-dominated solutions that dominate all benchmarking approaches in terms of feature reduction and classification accuracy in all cases. Hence four non-dominated solutions are produced, which evolve with less error rate and a small number of features. Four non-dominated solutions are produced by LMuMOGWO while attaining less error rate compared to benchmarking methods.

For the Zoo dataset, LMuMOGWO produced non-dominated solutions that dominate all benchmarking approaches in terms of both feature reduction and minimizing the classification error rate in all cases. Hence, four non-dominated solutions are produced, which evolve less error rate as well a small number of features which demonstrates a better performance of the proposed LMuMOGWO against benchmarking methods. Hence the selected features are (1 or 3, or 2) with error rates (of 1.20 or 0.079 or 0.35) respectively. In the Lymphography dataset, three non-dominated solutions were obtained by LMuMOGWO, which involves higher classification accuracy and lower selected features. The BMOGWO-S, BMOGWO-V, BMOPSO and NSGA-II algorithms produce 5, 7, 5 and 8 non-dominated solutions, which evolve large features and high error rates compared to LMuMOGWO. In the SpectEW dataset, the proposed LMuMOGWO produced non-dominated solutions similar to BMOGWO-S, but more than BMOGWO-V and BMOPSO and better than NSGA-II. However, LMuMOGWO produced a non-dominated solution with a better error rate than all algorithms and fewer features compared to BMOPSO and NSGA-II. LMuMOGWO produced 7 non-dominated solutions, of which the error rates are better than using all features, and the error rates obtained by LMuMOGWO for all non-dominated solutions are better than all benchmarking methods. Similarly, as can be in the BreastEW dataset, the proposed LMuMOGWO produced (10) non-dominated solutions, which is more than the ones achieved by BMOPSO (7) and NSGA-II (8), whereas less than BMOGWO-S (11) and BMOGWO-V (14). How-ever, the non-dominated solution achieved by LMuMOGWO is much better than all benchmarking schemes for both classification accuracy and feature reduction in all cases.

The IonosphereEW dataset also shows that the LMuMOGWO clearly outperforms all benchmarking algorithms in terms of attaining higher classification accuracy and better feature reduction. Five non-dominated solutions are produced by LMuMOGWO, evolving lower classification errors and a smaller number of features compared with the ones produced by the benchmarking approaches. This is because of the new mechanism integrated into the algorithm, which is Lévy flight and the mutation operator, which help select the most informative features with less error rate. Similarly, as can be seen in the KrvskpEW dataset, the proposed LMuMOGWO produced non-dominated solutions that dominate all benchmarking approaches regarding feature reduction and less error rate in all cases. Hence, sixteen non-dominated solutions are produced, evolving to a lower classification error rate and fewer features.

In the Waveform dataset, the proposed LMuMOGWO produced (10) non-dominated solutions that dominate all benchmarking approaches for feature reduction and less error rate in all cases, also reduced the original features from 40 features to a maximum of 12 features where the error rate of classification is much better compared with the usage all features. When the features are reduced to (1 feature), the classification error rate becomes high, which is normal and proves the conflict between the two objectives. Similarly, as seen in the SonarEW dataset, the proposed LMuMOGWO produced non-dominated solutions that dominate all benchmarking approaches in terms of lower error rate and reducing features in all cases, hence nine non-dominated solutions are produced, which evolve less error rate as well small number of features. It is also observed that LMuMOGWO reduces the number of features from 60 feature to a maximum of 12 features in which the classification performance is further improved, also even if the benchmarking methods produced less non-dominated solutions how-ever, these non-dominated solutions involve a larger number of features and higher classification error rate compared to LMuMOGWO. Lastly, in a dataset PengLungEW, the proposed LMuMOGWO clearly outperforms all benchmarking methods in producing non-dominated solutions that involve fewer features and achieve higher classification accuracy. It indicates that the five non-dominated solutions have fewer selected features where the maximum number of selected features is 7 and less classification error rate compared to other algorithms, especially BMOPSO and NSGA-II. Hence their non-dominated solutions involve a relatively large number of features.

From the obtained results, LMuMOGWO dominates all benchmarking approaches on classification accuracy and a number of selected features in most cases. For instance, in the following datasets (Breast Cancer, HeartEw, Zoo, and Lymphography), the proposed LMuMOGWO produces a non-dominated solution better than BMOGWO-S, BMOGWO-V, BMOPSO, and NSGA-II in terms of minimizing the error rate and reducing the number of features. When the non-dominated solutions of LMuMOGWO are compared with BMOGWO-S, BMOGWO-V, BMOPSO, and NSGA-II, mostly all the BreastEW, KrvskpEW, SpectEW, and IonosphereEW datasets, LMuMOGWO achieves superior performance compared with BMOGWO-S and the other compared algorithms in addressing feature selection problems in more than 30 dimensions. In addition, LMuMOGWO achieves better feature reduction performance, and it significantly improves the accuracy of classification. This is due to the effects of the mutation operator as well as Lévy flight embedded mechanism on balancing the exploration and exploitation of the algorithm. Similarly, in Waveform, SonarEW, and PenlungEW datasets, the proposed LMuMOGWO approach achieves better performance, particularly in selecting a lower number of features. For example, in the PenglungEW dataset, the maximum number of selected features in BMOPSO, NSGA-II, BMOGWO-V and BMOGWO-S are 129, 126, 38, and 14, respectively, respectively, whereas the proposed LMuMOGWO selects only 7 effective features with better classification error rate. This proves that LMuMOGWO can reduce about 95% of the original features.

This is because the proposed Lévy flight and mutation embedded strategies efficiently balance the exploration and the exploitation per-formed by the BMOGWO-S algorithm. It is observed that the proposed Lévy flight strategy enriches the explorative behaviors of BMOGWO-S by producing long jumps, and to avoid the drawbacks of such long jumps, which leads to ignoring some more informative features, the mutation operator is integrated in order to add the most applicable features. Those mechanisms can be observed when LMuMOGWO tends to discover unseen areas of the feature selection search space. Accordingly, the new mechanisms have enhanced the BMOGWO-S in obtaining a desired balance between global and local search inclinations and avoiding the local optima problem. We can observe from the obtained results that the proposed LMuMOGWO, which incorporates the Lévy flight and mutation, overcomes the drawback of local optima in the previously proposed BMOGWO-S approach. The results have shown that LMuMOGWO is able to achieve superior classification accuracy and feature reduction performance compared with BMOGWO-S, BMOGWO-V, BMOPSO, and NSGA-II. Overall, we can rank the algorithms according to the results as follows, 1) LMuMOGWO, 2) BMOG-WO-S, 3) BMOGWO-V, 4) BMOPSO, 5) NSGA-II.

For further comparison, statistical results are presented that show how many features are selected by LMuMOGWO, BMOGWO-S, BMOGWO-V, NSGA-II, and BMOPSO. Firstly, as

Dataset | Statistic | LMuGWO | BGWO-S | BGWO-V | BMOPSO | NSGA-II |
---|---|---|---|---|---|---|

Breastcancer | Min | 1 | 1 | 1 | 1 | 1 |

Max | 4 | 6 | 4 | 7 | ||

Range | 3 | 5 | 3 | 6 | ||

STD | 1.29 | 1.92 | 1.29 | 2.16 | ||

Average | 2.5 | 3.2 | 2.5 | 4 | ||

WineEW | Min | 1 | 1 | 1 | 1 | 1 |

Max | 6 | 8 | 7 | 6 | ||

Range | 5 | 7 | 6 | 5 | ||

STD | 1.87 | 2.45 | 2.30 | 1.87 | ||

Average | 3.5 | 4.5 | 3.4 | 3.5 | ||

HeartEW | Min | 1 | 1 | 1 | 1 | 1 |

Max | 5 | 6 | 6 | 7 | ||

Range | 4 | 5 | 5 | 6 | ||

STD | 1.58 | 1.92 | 1.87 | 2.16 | ||

Average | 3 | 3.2 | 3.5 | 4 | ||

Zoo | Min | 1 | 1 | 1 | 1 | 1 |

Max | 5 | 6 | 7 | 6 | ||

Range | 4 | 5 | 6 | 5 | ||

STD | 1.58 | 1.87 | 2.45 | 1.87 | ||

Average | 3 | 3.5 | 4.5 | 3.5 | ||

Lymphography | Min | 1 | 1 | 1 | 2 | 1 |

Max | 5 | 7 | 6 | 9 | ||

Range | 4 | 6 | 4 | 8 | ||

STD | 1.58 | 2.16 | 1.58 | 2.67 | ||

Average | 3 | 4 | 4 | 4.63 | ||

SpectEW | Min | 1 | 1 | 1 | 3 | 1 |

Max | 8 | 11 | ||||

Range | 6 | 6 | 6 | 10 | ||

STD | 1.87 | 3.49 | ||||

Average | 4 | 4 | 5.5 | 5.25 | ||

BreastEW | Min | 1 | 1 | 1 | 6 | 1 |

Max | 12 | 15 | 13 | 13 | ||

Range | 9 | 11 | 14 | 12 | ||

STD | 3.02 | 3.65 | 4.31 | 4.13 | ||

Average | 6 | 7.57 | 9.14 | 5.75 | ||

Ionosphere | Min | 1 | 1 | 1 | 4 | 2 |

Max | 7 | 13 | 12 | 17 | ||

Range | 6 | 12 | 8 | 15 | ||

STD | 2.16 | 4.35 | 2.97 | 5.45 | ||

Average | 4 | 6.7 | 7.4 | 8 | ||

KrvskpEW | Min | 1 | 1 | 1 | 8 | 5 |

Max | 18 | 18 | 19 | 24 | ||

Range | 17 | 17 | 18 | 19 | ||

STD | 4.99 | 5.02 | 5.01 | 6.86 | ||

Average | 8.27 | 12 | 12.38 | |||

WaveformEW | Min | 2 | 1 | 9 | 3 | |

Max | 19 | 15 | 20 | 20 | ||

Range | 17 | 14 | 17 | |||

STD | 4.78 | 4.20 | 5.78 | |||

Average | 9.92 | 6.83 | 12.71 | 10.5 | ||

SonarEW | Min | 2 | 12 | 6 | ||

Max | 16 | 20 | 26 | 14 | ||

Range | 14 | 19 | 14 | 8 | ||

STD | 5.59 | 5.21 | 5.00 | 2.93 | ||

Average | 8.57 | 7.25 | 17 | 9.88 | ||

PenglungEW | Min | 1 | 1 | 1 | 116 | 100 |

Max | 14 | 38 | 129 | 126 | ||

Range | 13 | 37 | 13 | 26 | ||

STD | 4.15 | 11.41 | 5.01 | 9.61 | ||

Average | 6.71 | 11.56 | 120.5 | 107.43 |

Secondly, considering datasets such as (BreastEW, Ionosphere, SpectEW, and KrvskpEW) LMuMOGWO achieves the smallest average number of features com-pared with benchmarking methods expected in SpectEW and KrvskpEW datasets, BMOGWO-V has the finest average (8.27) in KrvskpEW. However, LMuMOGWO obtains a lower classification error rate. Similarly, considering the SpectEW dataset, BMOGWO-V has the finest average (3.67); however, LMuMOGWO attains a lower classification error rate. Following the ‘No Free Lunch’ theorem, an algorithm commonly best for all problems does not exist [

As a result, the proposed LMuMOGWO can outperform the benchmarking methods in many aspects, including minimum classification error rate, the average number of selected features, and non-dominated solutions with minimum and maxi-mum features. The reason is that LMuMOGWO, which embedded Lévy flight and a mutation mechanism, can effectively balance the exploitation and exploration tendencies of the BMOGWO-S. Therefore, these mechanisms improve the tendency of the algorithm to produce more diminutive Lévy jumps based on a preferable subset of features. Moreover, the proposed modifications on LMuMOGWO that include Lévy flight with mutation operator have shown their effectiveness in avoiding the local optima problem, especially when dealing with large-size datasets. If LMuMOGWO escapes some informative features in the search space due to the long jumps of Lévy, the mutation operator helps to add such informative features.

This new mechanism is beneficial to explore new regions close to the newly explored non-dominated solutions. Due to this reason, the new algorithmic modifications have strong abilities to eliminate redundant features, unlike the benchmarking approaches.

Dataset | LMuMOGWO | BMOGWO-S | BMOGWO-V | BMOPSO | NSGA-II |
---|---|---|---|---|---|

Breastcancer | 6.67 | 6.26 | 9.082 | 10.27 | 11.02 |

WineEW | 6.47 | 7.99 | 8.24 | 9.40 | 10.95 |

HeartEW | 7.18 | 7.69 | 8.54 | 12.76 | 10.16 |

Zoo | 6.85 | 5.59 | 7.78 | 8.38 | 10.07 |

Lymphography | 6.82 | 5.95 | 8.45 | 8.97 | 11.51 |

SpectEW | 6.50 | 5.62 | 7.78 | 6.79 | 10.70 |

BreastEW | 7.58 | 7.02 | 8.45 | 8.74 | 13.16 |

Ionosphere | 7.15 | 6.64 | 8.46 | 8.19 | 11.86 |

KrvskpEW | 13.29 | 10.55 | 13.50 | 25.69 | 32.17 |

WaveformEW | 12.07 | 11.70 | 16.12 | 19.35 | 27.71 |

SonarEW | 8.57 | 8.87 | 8.25 | 15.05 | 11.07 |

PenglungEW | 7.78 | 6.98 | 9.53 | 33.98 | 35.93 |

A fair comparison is carried out as population size, the maximum number of iterations and archive/repository size are the same for all compared algorithms. The need to calculate crowding distances and the ranking procedure of non-dominated solutions have generally increased the run time of BMOPSO and NSGA-II. LMuMOGWO only uses a Lévy flight and one-phase mutation mechanism without the need to calculate crowding distances or implement any ranking method, which in turn shortens the execution time of LMuMOGWO.

Furthermore, LMuMOGWO can select fewer features, resulting in less computational time, specifically when large datasets are considered. Based on the overall outcomes, it is evident that LMuMOGWO demonstrates better performance and outperforms the compared algorithms, especially in maximizing the classification accuracy and reducing features. The LMuMOGWO’s exploration capabilities are expanded by integrating the newly proposed mechanisms, the search steps in Lévy, and the mutation operator. Lévy’s long jump action reduces the chance of dropping into a local optimal. In addition, the mutation operator in LMuMOGWO can reduce the risk caused by the long jumps of Lévy in case not select some informative features.

As results have demonstrated, one of the main strengths of LMuMOGWO is its promising exploration abilities compared with our previously proposed BMOG-WO-S. LMuMOGWO achieves this with the help of the Lévy flight walks that can enhance the competency of wolves to discover promising regions of the fitness basins. Results have shown the efficiency and superiority of the LMuMOGWO algorithm in solving feature selection problems. The implemented Lévy flight distribution can adapt to the exploitation and, on occasion, the exploitation. This significantly helps LMuMOGWO to jump out of local optima and find promising regions of the fitness basins. Moreover, the overall performance of LMuMOGWO is better than other algorithms, including BMOGWO-V, BMOPSO, and NSGA-II, as it can explore areas that lead to precise solutions. The two main factors that enable the features mentioned above are adaptive searching behaviors in BMOGWO-S or improved Lévy flight random walks in LMuMOGWO.

Besides, the particular characteristics of the algorithm itself, which has fewer parameters in comparison with other algorithms and its tiny memory size of having a position vector only, while MOPSO has a velocity vector and a position vector. Also, the gird mechanism implemented in the MOGWO removes the stored solution once the archive gets full and there is a superior solution to be added, unlike BMOPSO and NSGA-II, which preserve the stored solutions causing the duplicated of the solutions, which in turn leads to the rapid diversity loss that negatively contributes to the problem of the premature convergence. Nevertheless, the selection of only one solution from the attained non-dominated solution is a crucial matter. For feature selection problems, reducing the selected features and maximizing classification accuracy are two conflicting objectives were optimizing one of them may degrade the performance of the other (a trade-off issue). There are several metrics to assess the performance of multi-objective techniques. However, since our research is a feature selection based on a discrete/binary multi-objective problem, no specific measures for comparing the performance exist since almost all the metrics are intended for continuous multi-objective problems. To summarize, the results of LMuMOGWO are promising for most of the tested datasets in achieving lower classification error rates, reduction of features, and computation time.

The proposed

To evaluate the statistical significance of the variations in the average selected features obtained from the proposed LMuMOGOW algorithm and other optimizers, the Wilcoxon test was utilized. This analysis aims to determine the independence of the results obtained from two tests. The null hypothesis assumes that there is no significant difference between the average selected features obtained from the proposed LMuMOGWO optimizer and other optimizers. A significance level greater than 5% confirms the null hypothesis, while a significance level less than 5% rejects it. The

Comparison | |
---|---|

LMuMOGOW |
0.00219 |

LMuMOGOW |
0.00144 |

LMuMOGOW |
0.00111 |

LMuMOGOW |
0.00111 |

This work addresses the feature selection problem by proposing an enhanced binary multi-objective grey wolf optimizer. We specifically combined the Lévy flight and mutation operators and the sigmoid transfer function for feature selection search space to improve BMOGWO-S’s global search. Additionally, an Artificial Neural Network (ANN) classifier is utilized to assess non-dominated solutions. On twelve commonly used datasets with varying degrees of complexity, the proposed LMuMOGWO was assessed and compared with four multi-objective approaches, including two that were based on MOGWO and another two high-performance algorithms, BMOPSO and NSGA-II. The results show that, in most cases, LMuMOGWO successfully produced better non-dominated solutions. LMuMOGWO surpasses the existing algorithms in terms of classification accuracy and feature reduction. The study also found that LMuMOGWO can perform deep exploration that allows it to obtain a set of non-dominated solutions instead of considering the problem with only a single solution when feature selection is a problem with multiple objectives. Additionally, users can choose their preferred solutions that satisfy their needs by checking the generated Pareto front by multi-objective optimizations. Although the Pareto front of LMuMOGWO can achieve an ideal set of features, a lot of room still exists for improvement. In section 5, the experimental part analyzes the performance of different algorithms for different-size datasets and verifies the superiority of the LMuMOGWO algorithm in solving multi-objective optimization of feature selection. In future work, we can consider demonstrating the effectiveness of the LMuMOGWO algorithm proposed in this paper on data sets with a more significant number of dimensions or discussing the influence of feature correlation on classification results in data sets. In this paper, ANN is used as the evaluator of the feature subset. In future work, if different packaging schemes are considered, other classifiers can be used to evaluate the proposed LMuMOGWO method, which may have better results.

This research was supported by Universiti Teknologi PETRONAS, under the Yayasan Universiti Teknologi PETRONAS (YUTP) Fundamental Research Grant Scheme (YUTP-FRG/015LC0-274). The Development of Data Quality Metrics to Assess the Quality of Big Datasets. Also, we would like to acknowledge the support by Researchers Supporting Project Number (RSP-2023/309), King Saud University, Riyadh, Saudi Arabia.

The authors declare that they have no conflicts of interest to report regarding the present study.