To enhance the accuracy of performance analysis of regional airline network, this study applies complex network theory and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to investigate the topology of regional airline network, constructs node importance index system, and clusters 161 airport nodes of regional airline network. Besides, entropy power method and approximating ideal solution method (TOPSIS) is applied to comprehensively evaluate the importance of airport nodes and complete the classification of nodes and identification of key points; adopt network efficiency, maximum connectivity subgraph and network connectivity as vulnerability measurement indexes, and observe the changes of vulnerability indexes of key nodes under deliberate attacks and 137 nodes under random attacks. The results demonstrate that the decreasing trend of the maximum connectivity subgraph indicator is slower and the decreasing trend of the network efficiency and connectivity indicators is faster when the critical nodes of the regional airline network are deliberately attacked. Besides, the decreasing trend of the network efficiency indicator is faster and the decreasing trend of the maximum connectivity subgraph indicator is slower when the nodes of four different categories are randomly attacked. Finally, it is proposed to identify and focus on protecting critical nodes in order to better improve the security level of regional airline system.

Regional airlines are an important part of the national route network [

Scholars both at home and abroad have proposed various evaluation methods, such as single-indicator evaluation. P. Bonacich proposed the degree centrality based on node degree value [

Clustering is an important component of unsupervised learning in the field of machine learning, and clustering algorithms have important applications in industry and have also received extensive attention and research in academia [

Recently, several scholars have introduced complex network topologies into the field of airport networks, and Yaru Dang constructs a Chinese air passenger flow network and adopts a deliberate attack with three centrality degrees to explore the network’s resistance to destruction [

The current aviation network performance studies are based on complex network topologies, and the selection of key nodes is relatively simple, lacking a perfect index system. Besides, the analysis of network performance uses a single index. In fact, the importance of airport network nodes is also related to the attributes of the airport and its region, and the comprehensive consideration of multi-layer indicators makes it reasonable to assess the importance of nodes. In addition, several indicators that can measure the network performance are selected to explore the degradation trend of the performance of regional airline network and propose optimization suggestions.

In this paper, airports with less than 2 million passengers per year are defined as feeder airport. Besides, the routes connecting feeder airports at one end or both ends are positioned as feeder routes. All feeder routes form a feeder airline network. Using Gephi visualization software, the undirected regional airline network diagram is drawn, as shown in

There are 161 nodes and 2290 edges in the figure. The nodes represent the regional airports and the edges represent the direct connection between two airport nodes. The size of a node in the graph depends on its degree value, indicating the higher the degree value, the larger the node.

The identification of airport key nodes is important for the vulnerability analysis of the regional airline network. To establish a perfect index system of airport node importance, the selected indexes should be comprehensive and representative. They should not only reflect the nature of complex network topology, but also combine with the attributes of the airport and the region where the airport is located. Therefore, the selected indicators are shown in

Level 1 indicators | Secondary indicators | Indicator description |
---|---|---|

Topology |
V1 degree | Number of edges connected to the node |

V2 Median | The ratio of the number of paths passing through the node to the total number of shortest paths in the network among all shortest paths | |

Airport factors |
V3 Throughput | The number of passengers entering and leaving the range of airplanes such as airlines |

V4 Landing and takeoff times | Number of flight landings and takeoffs per unit time | |

Urban Factors |
V5 Prosperity | GDP level of the region to which the airport belongs |

V6 Population | Resident population in the area where the airport is located |

The greater the degree of a node in an airport network, the more routes the airport has, and the higher its importance. The number of nodes reflects the role and influence of the nodes in the network, which is a vital global geometric quantity.

Airport attributes are reflected by passenger throughput and landings. Passenger throughput is one of the most intuitive signs of an airport’s capacity and busyness, and airports with higher throughput have more flights and are more important. City attributes include prosperity and population. A city’s population reflects a city’s consumption demand and spending power, and the higher the population of the city where the airport is located, the higher the demand for air travel, the higher the level of city GDP, the higher the spending power of residents, the higher the possibility of choosing air travel, and the higher the importance of the airport. Therefore, the selected indicators are presented in

where the V1 node degree is calculated as _{ij} denotes the adjacency matrix variable, _{i} Expression degree.

The formula for calculating the number of points is _{j} is the number of shortest paths between nodes k, j.

The commonly used vulnerability metrics include network efficiency, network connectivity, and maximum connectivity subgraph size. The network efficiency metric measures the effectiveness of the network, the connectivity metric calculates the overall working performance of the network, and the maximum connectivity subgraph metric computes the connectivity of the network.

Network efficiency is expressed as:

The network connectivity is expressed as:

The formula _{G} denotes the number of edges of the network G, and

The maximum connected subgraph dimensions are expressed as:^{′} refer to the maximum connected subgraph size before and after node damage, respectively, that is, the maximum number of connected subgraph nodes formed by the network.

The entropy topsis method is an assignment method based on the entropy method to improve the TOPSIS method model, which can determine the weights according to the amount of information reflected by the degree of variation of each index value, and can calculate the closeness to the optimal solution by the value of the difference between the evaluation object and the positive and negative ideal solutions.

Step 1: Standardization of indicators. _{ij} is the original value of the ith indicator in the jth year.

Positive indicators:

where, i denotes the year, j denotes the indicator, and i, j represent non-zero natural numbers.

Step 2: The normalization of the indicator is processed to calculate the weight of the ith indicator in the jth year.

Step 3: Calculate the information entropy of the indicators _{j}:

Step 4: Calculate the coefficient of variation of each indicator and the weight of the indicator.

Step 5: Construct the weighted normalized decision matrix _{ij}, _{ij} = _{j}_{ij}, _{ij}) _{m×n}.

Step 6: Determine the ideal solution and negative ideal solution. It indicates the larger the value of the element _{ij}, the better the solution.

Step 7: Calculate the distance of each node to the ideal solution

Step 8: The relative proximity of each ne is calculated and ranked, indicating the larger the value of _{i}, the more important the node is.

DBSCAN is a representative of a typical clustering method that divides classes based on data density and can be effective in automatically discovering the number of target clusters without setting initial values and effectively discovering clusters of different shapes. Given a sample set of D = (x_{1}, x_{2}, …, x_{m}), the parameters (ε, MinPts) are adopted for describing the closeness of the sample distribution in the neighborhood. In the formula, ε describes the neighborhood distance threshold of a sample, and MinPts depicts the threshold of the number of samples in the neighborhood of a sample with distance ε. The clustering process of the DBSCAN algorithm can be expressed as follows.

An arbitrary selection of a data object point p from the data set.

If the selected data object point p is a core point for the parameters Eps and MinPts, find all data object points that are reachable from the density of p to form a cluster.

If the selected data object point p is an edge point, select another data object point.

Repeat steps (2) and (3) until all points are processed.

In this study, U1 indicators have been calculated and the data are obtained from the Civil Aviation Administration of China (CAAC) Civil Aviation Advance Flight Plan Management System. Besides, the domestic flight schedule for China’s summer and autumn seasons in 2020 is selected. Specifically, a total of 2148 flight segments data of 161 airport nodes are included. U2 indicator data are obtained from Civil Aviation from Statistics, and U3 indicator is obtained from the official website of National Bureau of Statistics. In summary, the partial data of the importance index of regional airline network nodes are summarized in

Airport | degree | Median | Throughput/million passengers | Landing and takeoff times | Prosperity/billion yuan | Population/10,000 people |
---|---|---|---|---|---|---|

ZSLG | 21 | 3.356 | 192 | 11826 | 3277 | 460 |

ZHXF | 11 | 1.355 | 190 | 73847 | 4601 | 526 |

ZBCF | 14 | 1.729 | 189 | 12733 | 1763 | 403 |

ZULZ | 31 | 3.094 | 186 | 23629 | 2157 | 425 |

ZPDL | 28 | 3.758 | 177 | 13734 | 1484 | 334 |

ZWAK | 16 | 1.822 | 171 | 14831 | 1315 | 271 |

ZYYJ | 17 | 1.226 | 166 | 5740 | 727 | 55 |

ZUMT | 22 | 2.829 | 165 | 17845 | 1092 | 66 |

ZWTN | 15 | 1.334 | 160 | 11063 | 406 | 253 |

ZGZH | 16 | 1.108 | 157 | 10991 | 3176 | 416 |

ZHLY | 13 | 0.942 | 154 | 180286 | 5128 | 706 |

ZHSY | 10 | 0.434 | 152 | 11369 | 1915 | 321 |

ZSZS | 17 | 1.911 | 152 | 20250 | 1512 | 116 |

ZSJG | 13 | 0.480 | 149 | 4432 | 4494 | 836 |

ZWYN | 12 | 0.689 | 148 | 8056 | 1266 | 56 |

ZHES | 11 | 0.552 | 143 | 8925 | 1117 | 346 |

ZUYI | 20 | 1.646 | 138 | 12536 | 493 | 100 |

ZSLQ | 13 | 0.888 | 138 | 9498 | 5262 | 662 |

ZPTC | 22 | 1.555 | 137 | 8420 | 716 | 64 |

Based on the implementation process of DBSCAN clustering, the key parameters Eps and Minpts are set. A point p is arbitrarily selected from the sample set of clustered data X. If the condition of the point satisfies the determination of the core object, then all data points from that point density can become a cluster, while the data points that do not belong to any cluster are marked as noise points [

The clustering results are shown in

The entropy weight TOPSIS comprehensive evaluation method was applied to calculate the above 161 nodes, and the entropy weights of the six secondary indicators were 0.103667, 0.238827, 0.113405, 0.255894, 0.131915 and 0.156294. The specific ranking is shown in

Airport | Si^{+} |
Si^{−} |
Ci | Sort | Airport | Si^{+} |
Si^{−} |
Ci | Sort |
---|---|---|---|---|---|---|---|---|---|

ZHLY | 0.0665 | 0.1535 | 0.6978 | 1 | ZBHD | 0.1331 | 0.0587 | 0.306 | 13 |

ZSRZ | 0.0985 | 0.0972 | 0.4968 | 2 | ZUXC | 0.1472 | 0.0640 | 0.3031 | 14 |

ZHXF | 0.1043 | 0.0793 | 0.4318 | 3 | ZBMZ | 0.1544 | 0.0665 | 0.3012 | 15 |

ZPDL | 0.1387 | 0.0924 | 0.3997 | 4 | ZLAK | 0.1338 | 0.0557 | 0.294 | 16 |

ZULZ | 0.1304 | 0.0826 | 0.3879 | 5 | ZSZS | 0.1423 | 0.0523 | 0.2688 | 17 |

ZSLG | 0.1380 | 0.0859 | 0.3836 | 6 | ZUGU | 0.1383 | 0.0485 | 0.2597 | 18 |

ZGWZ | 0.1217 | 0.0710 | 0.3684 | 7 | ZBCF | 0.1467 | 0.0511 | 0.2585 | 19 |

ZGCD | 0.1234 | 0.0639 | 0.3412 | 8 | ZWAK | 0.1458 | 0.0508 | 0.2583 | 20 |

ZUMT | 0.1402 | 0.0709 | 0.3359 | 9 | ZSWF | 0.1604 | 0.0543 | 0.2529 | 21 |

ZHNY | 0.1335 | 0.0616 | 0.3157 | 10 | ZSLQ | 0.1528 | 0.0513 | 0.2512 | 22 |

ZUYB | 0.1429 | 0.0649 | 0.3122 | 11 | ZJQH | 0.1507 | 0.0479 | 0.2411 | 23 |

ZYCY | 0.1315 | 0.0586 | 0.3081 | 12 | ZSJG | 0.1603 | 0.0493 | 0.2354 | 24 |

Throughput is ZSLG, ZHXF, ZBCF, ZULZ, and ZPDL. Landing and takeoff times are ZHLY, ZSRZ, ZGWZ, ZHXF, and ZYCY. Prosperity is ZSWF, ZSLQ, ZHLY, ZHXF, and ZSJG, indicating the high economic level of the regions where they are located. Besides, the population is ZHNY, ZGFS, ZBHD, ZSWF, and ZSJG. Based on the data of each index and the comprehensive evaluation results, ZHLY, ZULZ, ZPDL and ZUMT rank relatively high, indicating that they are more important in all aspects of the regional airline network, and the key nodes with higher importance generally have more routes opened, more influence and higher annual throughput.

Vulnerability was first started in disaster research and gradually introduced into transportation and social fields in the 1970s. Till the present, there has been no unified definition. Vulnerability of the transportation system refers to the vulnerability of the transportation system to a certain event, consequently leading to a significant reduction in service capacity. Combined with the characteristics of regional aviation, its vulnerability can be defined as the extent to which network connectivity is affected when some airports and routes cannot operate normally due to natural disasters or terrorist attacks, etc [

According to the entropy weight TOPSIS method ranking of nodes’ importance in the previous section, the degree to which nodes fail under deliberate attacks and affect the change in vulnerability of the feeder airline network can be observed as follows.

Step1 Attack and remove the nodes in order from the highest to the lowest according to the node importance ranking.

Step2 Calculate the vulnerability index after removing the nodes

Step3 Plot and explore and draw conclusions.

Using Sublime Text software, write Python programs for node attacks and follow the steps to damage the nodes in the airline network with the aim to destroy the connectivity of the network. The default initial performance is 100%, and the key nodes are attacked intentionally. In addition, the relationship between the decreasing trend of vulnerability and the damaged nodes is analyzed, as shown in

The change of vulnerability after each node is attacked in

Based on the previous DBSCAN clustering of nodes into four classes of nodes class_0-class_3, the degree of vulnerability change of the feeder airline network is found by observing the four classes of nodes that fail under random attacks and influence the feeder airline network, as follows.

Step1: Randomly attack the nodes in each category according to category 1, category 2, category 3 and category 4 as the attack objects, respectively, and remove the corresponding nodes.

Step2: Calculate the vulnerability index after removing the nodes

Step3: Plot, analyze, and draw conclusions.

Using Sublime Text software, write Python programs for node attacks. The nodes in the airline network are damaged sequentially based on the steps to destroy the connectivity of the network. The default initial performance is 100%. Random attacks are performed on nodes in the four categories. Besides, the relationship between the decreasing trend of vulnerability and the damaged nodes is analyzed, as shown in

According to

To conclude, by combining the complex network theory and the related research of DBSCAN clustering algorithm, we have performed an applied study on the regional airline network based on the previous research results, which mainly includes the following aspects.

This study links the indicators in complex networks with the DBSCAN algorithm in machine learning, explores new node clustering methods, and provides certain reference ideas for the cross research of related disciplines.

Based on complex network theory, complex network indicators such as node degree and number of point mediators of regional airline network, as well as vulnerability indicators such as network efficiency, maximum connectivity subgraph and network efficiency can reflect the problems of insufficient connection between airports and topological waste, which can thus provide a basis for further planning and adjustment of subsequent routes.

After a comprehensive assessment of 161 regional aviation nodes, it was found that airports such as Luoyang, Rizhao and Xiangyang, which do not have high node degrees and node meshes, can also be applied as key nodes for key prevention due to higher landing and takeoff times and prosperity.

Through deliberate attacks on airport nodes and random attacks on four types of classified nodes, it is found that the maximum connectivity subgraph index has less impact, the network efficiency and connectivity indexes change in a similar trend, and that the network stability is improved after protecting the critical nodes.

During the research of this study, some areas were also found to be improved.

The selection for parameters Eps and Minpts in DBSCAN algorithm needs to be optimized, and the classification of clusters needs to be verified.

Due to the epidemic, the passenger volume of regional airports is small and the disparity is small, leading to the poor entropy weight method and clustering results.