With the increasing number of airports and the expansion of their scale, the aviation network has become complex and hierarchical. In order to investigate the complex network characteristics of aviation networks, this paper constructs a Chinese aviation network model and carries out related research based on complex network theory and K-means algorithm. Initially, the P-space model is employed to construct the Chinese aviation network model. Then, complex network indicators such as degree, clustering coefficient, average path length, betweenness and coreness are selected to investigate the complex characteristics and hierarchical features of aviation networks and explore their causes. Secondly, using K-means clustering algorithm, five values are obtained as the initial clustering parameter K values for each of the aviation network hierarchies classified according to five complex network indicators. Meanwhile, clustering simulation experiments are conducted to obtain the visual clustering results of Chinese aviation network nodes under different K values, as well as silhouette coefficients for evaluating the clustering effect of each indicator in order to obtain the hierarchical classification of aviation networks under different indicators. Finally, the silhouette coefficient is optimal when the K value is 4. Thus, the clustering results of the four layers of the aviation network can be obtained. According to the experimental results, the complex network association discovery method combined with K-means algorithm has better applicability and simplicity, while the accuracy is improved.

Complex networks are pervasive in various aspects of the human social world. Since Strogatz et al. [

The identification of community structure is called community detection. There are two general types of community detection methods, respectively, network partitioning method and hierarchical clustering method. The network partitioning method is based on the principle of uniform partitioning to analyze the structure of complex network communities. However, the proposed method cannot determine how many communities it is reasonable to decompose the network into. The Kernighan-Lin algorithm and Spectrum algorithm based on Laplace Graph Features are demonstrated to be two important network partitioning methods. Hierarchical clustering methods naturally divide the network into subgroups based on the similarity or strength of the connections existing between individual nodes. Hierarchical clustering methods are classified into coalescent and split methods based on whether an edge is added or removed from the network. Hierarchical clustering algorithms are usually able to classify core nodes well while there are errors in the classification of peripheral nodes. In addition, with the development of machine learning, the unsupervised learning field supports clustering algorithms such as K-means clustering, fuzzy C-mean algorithm [

In the present study, we introduce K-means clustering algorithm and combine it with complex network theory in order to develop aviation network related research. K-means algorithm was first proposed by MacQueen [

The combination of clustering algorithms such as K-means and complex networks has gradually become a popular direction of research. Based on the process and characteristics of traditional K-means algorithm, scholars mostly optimize the defects of K-means algorithm and apply the relevant theory of complex network discipline. Tian et al. [

1) Degree and degree distribution. Degree of a node is defined as the number of edges connected to that node. In the aviation network, the airport is the node

Degree distribution is an important statistical feature of airport network. The degree distribution of nodes in aviation network can be described by distribution function

2) Average path length. The distance between two nodes in the network is the number of edges on the shortest path between two nodes. In the branch aviation network of the present study, the distance

where the total number of possible edges in the network is

3) Clustering Coefficient. Clustering coefficient, also called cluster coefficient, agglomeration coefficient or cluster coefficient, is used to describe the agglomeration between airport nodes in aviation network, which is an important index of local characteristics of aviation network structure. The ratio of the actual number of edges

4) Betweenness. A certain node

Among them, is the number of shortest paths between nodes

5) Coreness. The k-core of a graph refers to the remaining subgraph after removing nodes with degree values less than k and their connections repeatedly. The number of nodes in the subgraph is the size of the core.

An aviation network remains a system of complex networks. The network model is generally constructed spatially. There are generally SPACE L, SPACE R and SPACE P methods which can be used to construct networks spatially [

The K-means algorithm is the representative of a typical prototype-based objective function clustering method, where some distance from the data points to the prototype is used as the optimized objective function [

Step 1: Arbitrarily select k objects as the center of the initial cluster.

Step 2: All objects in D are assigned to the nearest cluster based on the average value of the objects in the cluster.

Step 3: Updating the average of clusters, i.e, recalculating the center of each cluster.

Step 4: Repeat Step2 and Step3 until the cluster center no longer changes or the number of iterations exceeds the set maximum number of iterations.

In this paper, the contour coefficient is used as the method to evaluate the quality of clustering results. In terms of dataset samples

Among them,

For a particular clustering of a dataset, the contour coefficient

where n is the number of samples in the data set and

The data in the present study comes from the Civil Aviation Administration of China’s (CAAC) Advance Flight Planning Management System (AFPMS), which selects China’s domestic flight schedules for the 2020 summer and fall seasons. The data includes 9,278 flight segments from 232 airport nodes.

In this paper, we use the relevant indicators of complex network theory to explore the complex characteristics of China’s aviation network, and select the appropriate cluster value K for clustering analysis in accordance with the distribution of each index.

1) Degree and its distribution. Degree is a simple and important index in complex network research. Generally, the degree distribution of nodes is used to determine the network type. This paper calculates the degree of totally 232 Chinese airport nodes in the selected data, and obtains the degree distribution results of China’s aviation network nodes, which can be found in the

The probability distribution of node degree is fitted by curve, and the probability distribution of discovery degree shows good fitting effect for Gaussian curve fitting, but not for power-law distribution and polynomial distribution. As shown in

The connectivity between airport and other airports can be expressed by the degree of airport nodes. Through calculating the proportion of airports in all airports in China’s aviation network, it can reflect the hierarchical distribution of airport nodes and aviation network to a certain extent. Therefore, on this basis, the node levels of China aviation network are divided, as presented in

category | Range of degree values |
Connection ratio |
Number ofairports |
Proportion ofairports |
---|---|---|---|---|

1 | 129 | 55.60% | 2 | 0.86% |

2 | 97 |
41.81% |
5 | 2.16% |

3 | 31.47% |
8 | 3.45% | |

4 | 20.26% |
23 | 10.34% | |

5 | 2 |
10.34% |
24 | 10.34% |

6 | 1 |
0.43% |
170 | 72.28% |

Through the analysis of

2) Average path length. The aviation network constructed is an undirected and unauthorized network. Through calculating the complex network, the average shortest path of the whole network is 2.15, indicating that the average path length from any node a to any node B is 2.15. In comparison with 232 nodes, the average path length is smaller, which reflects the small world characteristics of aviation network to some extent. Simultaneously, the average shortest path length from each node to other nodes is divided into five levels, as shown in

category | Average path length |
Number of airports |
Proportion of airports |
---|---|---|---|

1 | 1.45 | 2 | 0.86% |

2 | 1.531.99 | 61 | 26.29% |

3 | 2.002.48 | 141 | 60.78% |

4 | 2.502.92 | 27 | 11.64% |

5 | 3.58 | 1 | 0.43% |

3) Clustering coefficient. Clustering coefficient is a measure of node clustering in complex networks. In most practical networks, there is a strong tendency of clustering among nodes. The average clustering coefficient of China aviation network is 0.639. Meanwhile, according to the measurement results, the node clustering coefficient is divided into 8 levels, as presented in

category | clustering coefficient |
Number of airports |
Proportion of airports |
---|---|---|---|

1 | 0 | 21 | 9.05% |

2 | 0.160.48 | 36 | 15.52% |

3 | 0.500.59 | 27 | 11.64% |

4 | 0.610.69 | 34 | 14.66% |

5 | 0.700.79 | 35 | 15.09% |

6 | 0.800.89 | 34 | 14.66% |

7 | 0.900.93 | 12 | 5.17% |

8 | 1 | 33 | 14.22% |

4) Betweenness. The necessity and influence of nodes in the whole aviation network are associated with the size of betweenness. It indicates the greater the betweenness, the stronger the centrality of nodes in the network. According to the calculation results, it is found that the distribution of node betweenness is extremely uneven, proving that the importance of Chinese airport hub in aviation network is extremely different. The distribution of betweenness can be divided into four levels in

category | Betweenness |
Number of airports |
Proportion of airports |
---|---|---|---|

1 | 117 | 50.43% | |

2 | 77 | 33.19% | |

3 | 29 | 12.50% | |

4 | 9 | 3.88% |

5) Coreness. According to the definition of core, we can find that the number of nodes in aviation network can measure the depth of the node in the whole “core” (aviation network). The large number of cores in the whole network indicates that most nodes in the network will not leave the network easily for the reason that other nodes are damaged. The core number of the whole network is the maximum number of cores in the network nodes. The aviation network constructed in this paper has 31 cores. Besides, there are four nodes with 31 cores. Compared with the network composed of 232 nodes, the core value does not remain high. Even if the degree of a node is extremely high, its core number may be very small, for example, the core number of star network center node is 1. At the same time, the core number of the node with degree value of 1 is also 1. Due to the uniform distribution of the number of cores, it is difficult to clearly divide the levels. Moreover, a new parameter “coreness degree ratio” is proposed to measure the distribution level of node cores, as shown in

category | Coreness / degree ratio |
Number of airports |
Proportion of airports |
---|---|---|---|

1 | 1.001.93 | 199 | 2.16% |

2 | 2.004.71 | 28 | 12.07% |

3 | 5.146.79 | 5 | 85.78% |

The airport nodes are divided into three levels according to the parameter coreness degree ratio. It indicates the higher the kernel ratio, the higher the status and importance of the airport nodes in the network. At the same time, in accordance with the complex network theory, the k values of moderate, clustering coefficient, average path length, betweenness and kernel number are 6, 8, 5, 4, 3, thus providing a scientific basis for the algorithm to select the initial cluster value reasonably.

The airport nodes are divided into three levels according to the parameter coreness degree ratio. The higher the kernel ratio, the higher the status and the importance of the airport nodes in the network. Based on the complex network theory, the cluster value k is determined as 6,8,5,4,3, according to the index of degree, clustering coefficient, average path length, betweenness and coreness, which can thus provide a scientific basis for the algorithm to select the initial cluster value reasonably.

Input: China aviation network G (V, E), complex characteristic matrix M (F, N)

Output: G (V, E) division results of China Aviation Network

Algorithm process:

1) China’s aviation network construction. The undirected and unauthorized network of China aviation is constructed in P space, and G (V, E) of China aviation network is obtained. The complex characteristic matrix M (F, N) of each node

2) K value is determined. According to the complex network characteristics of China’s aviation network, such as complexity, clustering coefficient, average path length, betweenness and kernel number, the parameter distribution is divided into levels and K value is reasonably determined.

3) The initial cluster center is determined. K-means algorithm is employed to determine K initial cluster centers randomly.

4) Distance measurement. In this paper, Euclidean distance is used as the distance measure in K-means algorithm and its formula is

5) The division of clusters. The distance between each node and each cluster center is compared, and the nodes are divided into clusters with the smallest distance.

6) Updating the clustering center. The sum of distances between each node in each cluster and other nodes in the cluster is calculated, and the node with the smallest sum is taken as the new cluster center.

7) Iteration. Iterating step 5) and step 6) until the cluster center does not change.

8) Cluster analysis evaluation. The contour coefficient formula was used to evaluate and score the results of different K values.

9) Optimal clustering. The cluster with the highest score of contour coefficient is regarded as the final community division result.

Based on the improved K-means algorithm, the clustering results with k values of 3,4,5,6,8 are obtained. The contour coefficient is employed to evaluate the results, and the contour coefficients of different k-value clustering results are obtained. The contour coefficients of clusters in the results are visualized. At the same time, the standardized clustering data are visualized. Details can be referred to

When the K value is 4, the contour coefficient score is the highest (visible table), indicating that the clustering effect is the best, which is shown in the

K value | index | Silhouette coefficient score |
---|---|---|

3 | Coreness | 0.5516641001704506 |

4 | Betweenness | 0.5665424947645851 |

5 | Characteristic path length | 0.4294426821758907 |

6 | Degree | 0.41790030425900554 |

8 | Clustering coefficient | 0.40671616536198535 |

When K value is 4, the clustering results are shown in the

category | Airport Code (ICAO) | Division basis |
---|---|---|

1 | ZSPD, ZGGG, ZBAD, ZLXY, ZUUU | Hub airport |

2 | ZBAA, ZWWW, ZBTJ, ZGSZ, ZUCK, ZSHC, ZPPP, ZGHA, ZLXN | Major airport |

3 | ZJHK, ZBHH, ZSNJ, ZSAM, ZHCC, ZUGY, ZYTL, ZSQD, ZSSS, ZSWZ, ZSFZ, ZHHH, ZBAL, ZBLA, ZWKL, ZYHB, ZBDS | Secondary airport |

4 | ZSSH, ZSJN, ZGNN, ZGSD, ZGHZ, ZGOW, ZSCN, ZPTC, ZSQZ, ZPJH, ZBSJ, ZSOF, ZYTX, ZULS, ZJSY, ZWSH, ZSWX, ZBYN, ZSNB, ZGDY, ZLLL, ZGKL, ZSXZ, ZSYT, ZGZJ, ZLDH, ZWAK, ZLIC, ZSLY, ZWKM, ZGBH, ZSJG, ZYCC, ZBMZ, ZPLJ, ZUBJ, ZULB, ZSNT, ZBOW, ZWAT, ZGWZ, ZGYY, ZBTL, ZBEN, ZULZ, ZSWH, ZBCF, ZSTX, ZUZH, ZUYI, ZSYA, ZBYC, ZYJX, ZWTN, ZSZS, ZWYN, ZUWX, ZWTL, ZUMY, ZWBL, ZLHZ, ZSCG, ZSFY, ZHYC, ZSDY, ZUTR, ZSGZ, ZGLG, ZPDQ, ZHXF, ZLYL, ZUMT, ZGFS, ZYMD, ZLYA, ZPJM, ZUYB, ZUXC, ZBHD, ZPWS, ZPSM, ZYDQ, ZYCY, ZPDL, ZUZY, ZPMS, ZBCD, ZGHY, ZSYN, ZUPS, ZWKN, ZBXZ, ZGZH, ZYYJ, ZUDC, ZBUH, ZYQQ, ZBYZ, ZLLN, ZHLY, ZPBS, ZUBD, ZHXY, ZGSY, ZSLG, ZGCD, ZYJD, ZBDT, ZYBS, ZSYW, ZUNC, ZSSR, ZYBA, ZBXH, ZSLQ, ZBDH, ZHSY, ZBUL, ZYYK, ZUNZ, ZBZJ, ZYJM, ZLJQ, ZUDX, ZBER, ZUGU, ZBLF, ZHNY, ZSYC, ZGBS, ZSWY, ZHES, ZWHZ, ZWHM, ZWSC, ZJQH, ZUJZ, ZBCZ, ZUHY, ZSWF, ZSJU, ZPNL, ZURK, ZGMX, ZSJH, ZPZT, ZUAS, ZSGS, ZYSQ, ZSJD, ZYJZ, ZYAS, ZBLL, ZLZY, ZLJC, ZYDD, ZYTN, ZLGM, ZLGY, ZLYS, ZLZW, ZSLO, ZSRZ, ZSSM, ZUAL, ZUQJ, ZWKC, ZWTC, ZYHE, ZYLD, ZYMH, ZBES, ZBHZ, ZGCJ, ZHSN, ZLHX, ZPCW, ZPLC, ZUKD, ZWCM, ZWFY, ZWTS, ZYFY, ZYJS, ZBAR, ZBSN, ZBUC, ZBZL, ZGHC, ZJYX, ZLDL, ZLGL, ZLHB, ZLTS, ZSAQ, ZUBZ, ZUGZ, ZUNP, ZUWS, ZWNL, ZWRQ | Regional Airport |

To conclude, combining the complex network theory and the related research of K-means clustering algorithm, based on the previous research results, we have conducted an applied research on Chinese aviation network, which mainly includes the following aspects.

1) This study links community detection in the field of complex networks with clustering algorithms in machine learning, explores new methods for community structure detection, and provides certain reference ideas for cross research of related disciplines.

2) This work constructs a Chinese aviation network model in P-space, and investigates the complex characteristics of Chinese aviation network based on degree, clustering coefficients, average path length, betweenness and coreness.

3) A new K-value determination method is proposed to visually divide the hierarchy based on the network complexity characteristics, while silhouette coefficient is applied to evaluate the clustering results. It is found that betweenness of Chinese aviation network exerts an optimal effect on the node clustering in Chinese aviation network.

During the research of the present study, some areas were also found to be improved.

1) The distance metric of the complex characteristic matrix in the K-means algorithm needs to be optimized.

2) Although the subjective hierarchical division is used to select K-value, it has achieved good results, but it is easy to make mistakes. Meanwhile, excellent algorithms can be explored in the division of complex features.

3) The final division results are good in pivot nodes and trunk nodes division, and the algorithm does not perform well in other hierarchical results.