The massive influx of traffic on the Internet has made the composition of web traffic increasingly complex. Traditional port-based or protocol-based network traffic identification methods are no longer suitable for today’s complex and changing networks. Recently, machine learning has been widely applied to network traffic recognition. Still, high-dimensional features and redundant data in network traffic can lead to slow convergence problems and low identification accuracy of network traffic recognition algorithms. Taking advantage of the faster optimization-seeking capability of the jumping spider optimization algorithm (JSOA), this paper proposes a jumping spider optimization algorithm that incorporates the harris hawk optimization (HHO) and small hole imaging (HHJSOA). We use it in network traffic identification feature selection. First, the method incorporates the HHO escape energy factor and the hard siege strategy to form a new search strategy for HHJSOA. This location update strategy enhances the search range of the optimal solution of HHJSOA. We use small hole imaging to update the inferior individual. Next, the feature selection problem is coded to propose a jumping spiders individual coding scheme. Multiple iterations of the HHJSOA algorithm find the optimal individual used as the selected feature for KNN classification. Finally, we validate the classification accuracy and performance of the HHJSOA algorithm using the UNSW-NB15 dataset and KDD99 dataset. Experimental results show that compared with other algorithms for the UNSW-NB15 dataset, the improvement is at least 0.0705, 0.00147, and 1 on the accuracy, fitness value, and the number of features. In addition, compared with other feature selection methods for the same datasets, the proposed algorithm has faster convergence, better merit-seeking, and robustness. Therefore, HHJSOA can improve the classification accuracy and solve the problem that the network traffic recognition algorithm needs to be faster to converge and easily fall into local optimum due to high-dimensional features.

Network traffic identification provides the foundation and basis for subsequent network protocol involvement, network operation management, and network traffic scheduling. It also provides means for detecting network attacks and traffic cleaning in network security [

The port-based traffic identification method has a more significant advantage in the early days. This method matches corresponding network traffic applications by detecting the port number in TCP/UDP packet headers. However, with the changing development of network applications, this simple and fast port identification method is becoming less and less applicable. Moore et al. [

Machine learning has now been applied to network traffic identification. However, most of the current network traffic classification methods are based on traditional machine learning approaches, and the classification performance depends on the design of traffic features. Designing a set of features that can accurately characterize the traffic requires a lot of manual experience and feature engineering skills. Traffic feature selection and extraction is the method’s most essential and computationally intensive part. Deep learning methods are used to build end-to-end deep learning models by autonomously learning from the original traffic set to high-dimensional features [

Considering the abovementioned problems, we propose a jumping spider optimization algorithm incorporating HHO and small hole imaging (HHJSOA). We use it in network traffic identification feature selection. A coding scheme for jumping spider individuals is proposed to find the optimal individuals by multiple algorithm iterations. The optimal individuals are used as the selected features for KNN classification. We use the UNSW-NB15 dataset and the KDD99 dataset. Then compare the proposed method with feature selection methods based on other algorithms to verify the effectiveness of our proposed HHJSOA algorithm. The main contributions of this paper are as follows:

A new feature selection method for network traffic identification is proposed. The method solves the drawback that the redundant data in the network traffic dataset leads to slow convergence of network traffic identification algorithm and falls into local optimum. The performance of the proposed method is compared with other methods using benchmark test functions and two different datasets.

Propose a new HHJSOA algorithm based on the JSOA algorithm. HHJSOA has a robust global search capability and local development capability.

The HHJSOA algorithm is a network traffic identification feature selection method. We use the binary version of HHJSOA to select the optimal combination of features, i.e., the optimal jump spider individual.

Fusion of HHO constitutes a novel search strategy for HHJSOA. Small hole imaging updates inferior jumping spider individuals to prevent JSOA from falling into local optimum. The escape energy factor of HHO, hard siege strategy, and small hole imaging play an essential role in the feature selection process of HHJSOA.

The rest of this paper is organized as follows:

In the current network environment, achieving high accuracy in network traffic identification is no longer possible by relying on port numbers alone. Traditionally, DPI is the most accurate classification technique. Sen et al. [

These traditional methods can meet the needs of network traffic classification to some extent but still have limitations for the current network environment.

To address the limitations of traditional network traffic identification methods, more and more machine learning methods are applied to network traffic identification. Moore et al. [

Intelligent optimization algorithms are more self-organizing and have optimization capabilities. It can control the behavior of the whole group through the interaction of individuals without supervision, and the algorithm is easy to implement. Liu et al. [

From the no-free lunch theorem [

A new group intelligence algorithm called JSOA was proposed in 2021, which is conceptually simple, easy to implement, and efficient in finding the best. The jumping spider attack and search behaviors are switched randomly, and JSOA uses a position migration equation to update individual positions of low information rate jumping spiders.

The attack behavior of jumping spiders corresponds to the development of algorithms with persecution attacks and jumping attacks. Jumping spiders choose their attacks by the size of r, a random number between [0, 1]. When

When

The size of

The search behavior of the jumping spider corresponds to the exploration of the algorithm, including global and local search, and the search method is selected by the size of r.

When

When

The jumping spiders with low pheromone ratios less than or equal to 0.3 for the current iteration were relocated. The model for calculating the pheromone ratio of jumping spiders is as follows:

This paper uses the KNN learning algorithm to calculate the classification accuracy of selected feature subsets. The KNN algorithm finds the nearest K points of the sample to be predicted by the Euclidean distance and gives the prediction based on the majority of K points. The Euclidean distance is as follows:

The division of the search phase of the JSOA algorithm is based entirely on the size of random numbers, which prevents the global and local stages of the algorithm from being correctly balanced. On the one hand, the search space of jumping spiders is more extensive, and it is difficult to miss critical search information. On the other hand, the local exploration of JSOA is poor, and it is easy to fall into the local optimum. HHO is an intelligent population algorithm proposed in 2019, where HHO decreases prey’s energy and enters the local exploitation phase as the number of iterations increases. Due to Lévy wandering, HHO is highly resistant to local optima and disturbances and can successfully prevent the algorithm from “prematurely.” Therefore, this paper introduces the HHO prey escape energy factor E to form a seamless transition from global to local search in JSOA. The following is the equation for the escape energy factor E:

According to the escape energy factor E, We divide the jumping spider predation process into an intensive search phase, a search phase, and an attack phase. This search strategy solves the shortcomings of unbalanced global exploration and local development stages of JSOA and JSOA, easily falling into local optimization.

Several steps of the algorithm are listed below:

When E > 1, jumping spiders jump and search for prey, and prey are found and hide in the nearby area, and the population updates its position using

When 0.5 < E < 1, the jumping spiders searched the area where the prey might be hidden. As the number of iterations increased, the jumping spiders kept searching for the target while the target kept escaping, and the prey’s energy value E decreased until E < 0.5, i.e., the target was exhausted, and the population updated its position using the

When E < 0.5, the prey’s energy is depleted, and according to the persecutory attack

However, the persecution attack has only information interaction between the current individual and random individuals and lacks global exploration. The escape energy factor E is always less than one at the end of the iteration. This situation will lead the algorithm to a local optimum. Therefore, consider introducing the hard siege equation of HHO. The equation relies on the current position and the optimal individuals to achieve a diverse information interaction process. The following is the jumping spider hard siege equation:

If the child performs worse than the parent, the parent will be kept by the following generation without any improvements, according to the JSOA algorithm’s flow. This type of update wastes the computational resources used for this iteration, slows down the method’s convergence, and leads it to inferior accuracy of its converge. The process is assumed to enter the next iteration without improving the parent. In this case, the small hole imaging concept [

The dynamically changing inertia weight

The hybrid algorithm forms a new type of jumping spider search strategy. The global exploration capability of the original algorithm is retained, and the convergence is faster and more exploitable. The HHJSOA process is shown in

For the feature selection problem in traffic recognition, the following main processes are used to optimize the problem in this paper. To obtain a better feature subset scheme and classification accuracy. The flowchart of HHJSOA for the network traffic identification feature selection method is given in

A jumping spider is a search agent using the HHJSOA algorithm for feature selection. Each jump spider is encoded as a binary string, representing the feature selection scheme. The feature selection scheme is to select an acceptable 0/1 string. Each dimension takes 1 (selected) and 0 (unselected) values in the feature selection process. As shown in

Step 1: Load the dataset, pre-clean, and generalize the training data to get the collated dataset D.

Step 2: Set the search agent population size N, the maximum number of iterations T, and the initialization parameters.

Step 3: Initialize the position of each jumping spider in the jumping spider population; since the jumping spider’s positions are decimal numbers, they need to be mapped to 0 or 1 by the mapping function, shown in

According to the mapping scheme, let the jumping spider position of HHJSOA be

Step 4: Binary decoding is performed for each jumping spider individual, and the jumping spider individual position is updated according to the HHJSOA search strategy; the jumping spider individual is mapped between [0, 1] according to

Step 5: Decode each jumping spider individual again, compare the updated jumping spider individual fitness value to the fitness value of the original jumping spider individual, and keep the better jumping spider individual.

Step 6: Determine whether the iteration reaches the termination condition; if yes, end the iteration. Otherwise, execute Step 4.

This section uses the UNSW-NB15 dataset for comparison experiments. UNSW-NB15 has 2540004 data and contains 49 features in ten categories, including one normal type and nine attack types.

Function | Scope | D | Min |
---|---|---|---|

[−100, 100] | 30, 50, 100 | 0 | |

[−1.2, 1.2] | 2 | 0 | |

[–10, 10] | 30, 50, 100 | 0 | |

[–10,10] | 30, 50, 100 | 0 | |

[–5, 5] | 2 | 0 | |

[–10, 10] | 2 | 0 |

The KDD99 dataset contains 41 categories, of which 38 are numeric features, and 3 are character-based features. Each connection record in the dataset can be categorized as “Normal,” “DOS,” “R2L”, “U2R,” and “Probing.” We processed 10% of kddcup.data_10_percent_corrected, and the test dataset was corrected. After that, 10% of the data are randomly selected from the training and test sets as the experimental dataset.

Six standard test functions were chosen to test whether the HHJSOA algorithm significantly improved performance. The search space, dimensionality, and theoretical optimum of the test functions are described in

HHJSOA was compared with JSOA, jumping spider optimization with chaotic drifts (CJSOA) [

Func | HJSOA | CJSOA | NCHHO |
---|---|---|---|

2.19E − 158 ± 0.00E + 00 | 1.80E − 217 ± 0.00E + 00 | ||

2.07E − 04 ± 5.46E + 04 | |||

9.49E − 97 ± 4.49E − 96 | 6.17E − 125 ± 3.23E − 124 | ||

4.79E − 185 ± 0.00E + 00 | 4.47E − 254 ± 0.00E + 00 | ||

4.77E − 168 ± 0.00E + 00 | 2.90E − 251 ± 0.00E + 00 | ||

1.10E − 25 |
1.46E − 02 ± 3.80E − 02 | 4.79E − 05 ± 1.43E − 04 | |

Func | IGWO | JSOA | TSA |

6.32E − 04 ± 9.62E − 04 | 1.10E − 117 ± 6.02E − 117 | 3.86E − 04 ± 1.67E − 03 | |

4.44E − 08 ± 4.30E − 08 | 5.36E − 04 ± 1.71E − 03 | 1.01E − 01 ± 2.89E − 01 | |

3.69E − 04 ± 4.01E − 04 | 2.21E − 77 ± 8.46E − 77 | 2.95E + 01 ± 5.43E + 00 | |

7.23E − 84 ± 2.31E − 83 | 3.82E − 141 ± 1.23E − 140 | 3.34E − 57 ± 1.82E−56 | |

1.25E − 255 ± 0.00E + 00 | 7.60E − 133 ± 4.15E − 132 | 1.11E − 91 ± 6.09E − 91 | |

5.60E − 05 ± 1.08E − 04 | 2.30E − 01 ± 4.98E − 01 |

As seen from

To make a more objective evaluation of the algorithms, we used the Friedman and Wilcoxon tests in mathematical statistics to make an overall comparative analysis of the convergence ability of the algorithms.

Dimension | D = 30, 2 | D = 50, 2 | D = 100, 2 |
---|---|---|---|

Algorithm | Ranking | Ranking | Ranking |

HHJSOA | 1.71 | 1.69 | 1.69 |

CJSOA | 2.85 | 2.84 | 2.75 |

JSOA | 3.95 | 3.95 | 3.94 |

NCHHO | 2.71 | 2.69 | 2.68 |

IGWO | 4.20 | 4.24 | 4.28 |

TSA | 5.59 | 5.59 | 5.66 |

Dimension | D = 30, 2 | D = 50, 2 | D = 100, 2 |
---|---|---|---|

HHJSOA |
|||

CJSOA | 4.05E − 02 | 4.62E − 02 | 5.77E − 02 |

NCHHO | 1.36E − 02 | 1.54E − 02 | 1.47E − 02 |

IGWO | 4.77E − 05 | 2.22E − 05 | 8.40E − 06 |

JSOA | 1.71E − 03 | 1.82E − 03 | 1.70E − 03 |

TSA | 4.58E − 06 | 2.81E − 06 | 1.28E − 06 |

As seen from

Set the population size N = 30 and the maximum number of iterations T = 500.

As seen from

As seen from

To show the experimental results more clearly, we record each algorithm’s correct rate, fitness value, and number of features after 20 independent runs in

Algorithm | Accuracy(%) | Fitness value | Feature numbers |
---|---|---|---|

HHO | 80.30 | 9.60E − 03 | 6 |

EO | 85.45 | 6.92E − 03 | 6 |

WOA | 75.10 | 9.83E − 03 | 7 |

GWO | 79.20 | 6.78E − 03 | 7 |

JSOA | 82.40 | 8.88E − 03 | 6 |

HHJSOA | 92.50 | 5.31E − 03 | 6 |

As seen from

Traffic identification is crucial for network structure design, operation management, quality assurance, and security. To address the problem that high-dimensional features make machine learning algorithms take a long time to converge, and classification algorithms easily fall into local optimum. In this paper, we propose an improved jumping spider optimization algorithm HHJSOA. Firstly, we fuse HHO and use the escape energy factor and hard siege strategy to balance the global and local search of JSOA. Secondly, the original optimal solution is optimized using small hole imaging. The shortcomings of the original algorithm JSOA with poor convergence and easy fall into local optimum are solved. The effectiveness of the HHJSOA algorithm is verified by solving six different benchmark functions. The HHJSOA algorithm solves the network traffic identification feature selection problem. The results show that HHJSOA performs best compared with HHO, EO, WOA, GWO, and JSOA algorithms on the UNSW-NB15 dataset and KDD99 dataset. However, the correct classification rate of HHJSOA is less than 95%. The classification effectiveness of the HHJSOA-based feature selection method for network traffic identification used in this paper is related to the optimal feature subset evaluator. However, the selected optimal feature subset is evaluated only on the KNN classification algorithm. In future work, other classification algorithms can be added for comparative study to choose a classifier more suitable for HHJSOA to solve the high-dimensional network traffic identification feature selection problem.

The authors would like to thank the anonymous reviewers for their valuable comments, which improved the quality of our original manuscript.

This work is funded by the National Natural Science Foundation of China under Grant No. 61602162.

study conception and design: X.H, H.Y, C.W, and H.L; data collection: H.Y; analysis and interpretation of results: X.H, H.Y, and C.W; draft manuscript preparation: X.H, H.Y, C.W, and H.L, All authors reviewed the results and approved the final version of the manuscript.

The datasets used in this study are all publicly available datasets, which you can access in the following websites.

The authors declare that they have no conflicts of interest to report regarding the present study.