Rolebased network embedding aims to embed rolesimilar nodes into a similar embedding space, which is widely used in graph mining tasks such as role classification and detection. Roles are sets of nodes in graph networks with similar structural patterns and functions. However, the rolesimilar nodes may be far away or even disconnected from each other. Meanwhile, the neighborhood node features and noise also affect the result of the rolebased network embedding, which are also challenges of current network embedding work. In this paper, we propose a Rolebased network Embedding via Quantum walk with weighted Features fusion (REQF), which simultaneously considers the influence of global and local role information, node features, and noise. Firstly, we capture the global role information of nodes via quantum walk based on its superposition property which emphasizes the local role information via biased quantum walk. Secondly, we utilize the quantum walk weighted characteristic function to extract and fuse features of nodes and their neighborhood by different distributions which contain role information implicitly. Finally, we leverage the Variational AutoEncoder (VAE) to reduce the effect of noise. We conduct extensive experiments on seven realworld datasets, and the results show that REQF is more effective at capturing role information in the network, which outperforms the best baseline by up to 14.6% in role classification, and 23% in role detection on average.
Roles are sets of nodes with similar structural patterns and functions. Rolebased network embedding aims to project rolesimilar nodes into a compact lowdimensional vector space. It is widely used in various downstream tasks such as role classification, etc. The first appearance of the concept of roles is in sociology [
Most previous studies mainly focus on the local context such as node structure, while rarely measuring their relevance globally, especially when nodes are far away so that ignoring their similarity in the global context. For example, Rolebased graph embedding (Role2vec) [
In addition to the features of the node itself, its neighborhood nodes will also indeed affect the embedding of rolebased networks. For example, a music teacher in the social network is interested in both music and photography, the role of which is more influenced by music than photography and its neighborhoods may be more music lovers than photography lovers. Most of the current methods based on features focus on linear aggregation [
Nevertheless, the extraction or aggregation of multiple features is extremely sensitive to noise that may interfere with the final embedding. VAE [
Above all, there are mainly three challenges in current rolebased network embedding. 1) Rolesimilar nodes may be far away so we cannot capture global role information. 2) The varied distributions of nodes and their features are always ignored or simplified. 3) The noise may have an adverse influence on the embedding. To address these challenges, we propose a method called Rolebased network Embedding via Quantum walk with weighted Features fusion (REQF).
Firstly, we superimpose all walk paths based on the superposition of the quantum walk to capture the global role information by multistep evolution sequences and use the biased quantum walk to emphasize the local structure and learn role closeness. Secondly, we design a quantum walk weighted characteristic function to fuse features of nodes and their neighborhoods by different distributions. Finally, we use VAE to reduce the effects of noise and generate rolebased network embedding. Extensive experiments on realworld networks demonstrate the effectiveness and stability of the proposed REQF.
Our main contributions are listed as follows:
We propose a novel method REQF, which considers the influence of global and local role information, node features, and the effect of noise simultaneously for generating rolebased network embedding.
We utilize quantum walk to capture role information from global and local perspectives for solving the node distance problem, and we construct a quantum walk weighted characteristic function, that uses the quantum walk as probability weights of the characteristic function, to take node features into consideration for feature fusion.
We use VAE to reduce the effect of noise in the network and get the optimal rolebased network embedding.
The rest of the paper is organized as follows: In
In graph networks, roles are sets of nodes with similar structural patterns and functions. It is first defined as classes of structurally equivalent nodes, and the structural equivalence is measured by the structural similarity of nodes [
Rolebased networks are more concerned with the structural similarity of nodes and are independent of distance. To represent role information, Rolebased graph embedding (Role2vec) [
Quantum walk [
Presently, the quantum walk has considerable research prospects in the field of role embedding. Most quantum walk of graphs is implemented by calculating the superposition of probability distributions. Fast Quantum Walk Kernel (FQWK) [
On the other side, most of the current featurebased network embedding methods generate embedding by simple extraction or linear aggregation of features. For example, Role eXtraction (RolX) [
Furthermore, noise affects the generation of highquality network embedding. Due to the overreliance on neighborhood information, the featurebased methods are sensitive to noise, causing side effects on the generation of embedding. RESD [
In this paper, we try to overcome the limitations of rolebased embedding in terms of node distance and features via quantum walk and weighted characteristic function to capture the global role information and fuse node features, and we use VAE to effectively reduce the effect of noise, and finally obtain a more compact rolebased network embedding.
In this section, we introduce the notation and the proposed method REQF. We first start with an overview and then present the detailed designs, including role information representation, feature information fusion, and noise effect reduction. Finally, we analyze the computational complexity of REQF.
For clear clarification, the symbols and their definitions are listed in
Symbols  Definitions 

The graph network, 

The state of quantum walk  
The superposition of quantum states at the node 

The evolution operator of quantum walk  
The complexvalued probability amplitude of node 

The probability of quantum walk being at a node 

The probability distribution of all nodes at the 

The role of nodes 

The evolution sequence of nodes 

The summation function by row  
The similarity measure function  
The neighborhood nodes of node 

The adjacency matrix  
The degree of a node 

The biased probability  
The evolution times of the global part and local part  
The proximity ranking function by row  
The feature vector of node 

The evaluate points  
The node representation of the global role, local role, feature, and VAE  
Rolebased network embedding 
We define the probability of quantum walk being at a node
where
Rolesimilar nodes would be embedded into similar representations.
The illustration of our framework REQF is shown in
First of all, we utilize quantum walk to capture the global role information
REQF captures role information via quantum walk. It captures global role information and local role information by controlling the initial state probability. The global role information is the role relevance between nodes from the whole network. The local role information is the role proximity between nodes by the local structure of nodes, and it emphasizes more on the correlation between nodes and their neighborhoods.
When all nodes have the same initial state probability, they evolve simultaneously over the entire network, and REQF simulates the global evolution by the probability distribution matrix
where
If the initial state probability of a particular node is higher, the evolution of the quantum walk should be biased to emphasize the similarity of the local structure of the node [
where
Different distributions of neighborhood features may have different effects on nodes in real networks. FEATHER has theoretically demonstrated that graphs with the same structure have the same Characteristic Functions (CF). Therefore, considering the varied distributions of node neighborhood features, we use CF on the graph to describe the neighborhood feature distribution of the nodes, instead of fusing feature information in a simple linear aggregation way. The probability weights are defined by quantum walk to weight the CF which is to obtain feature information that highlights the structural property of the nodes.
According to the definition of the CF, we define the probability weights as the probability value of the quantum walk distribution matrix
where
In reality, a lot of noise exists in extracting the role and feature information of nodes, and some node features unrelated to roles may interfere with the final rolebased network embedding. To reduce the effect of noise, we use VAE to encode node role and feature information into the node embedding representation.
We use a MultiLayer Perceptron (MLP) as our decoder and define it as follows:
where
The node embedding representation
where
Equally, we use an MLP as the decoder:
where
Finally, we define the loss function and the final embedding is obtained as follows:
The whole process of REQF is described in Algorithm 1.
The computational complexity of the proposed REQF mainly depends on the quantum walk and VAE, which are highly related to the number of nodes and feature dimensions.
Given a network
The computational complexity of the REQF algorithm here is a bit high because the quantum walk is simulated with a conventional computer. But the complexity of the actual quantum walk is only
In this section, we evaluate the performance of REQF on realworld networks from the tasks of role classification, role detection and visualization, parameter sensitivity, and ablation study.
We conduct experiments on seven realworld networks with unweighted undirected edges, the detailed statistics of the datasets are shown in
Datasets  Nodes  Edges  Features  Classes 

Brazil flights  131  1,003  248  4 
Europe flights  399  5,993  652  4 
USA flights  1,190  13,599  1,160  4 
Cora  2,708  5,429  300  7 
Actor  7,758  26,646  2,836  4 
LastFM Asia  7,624  27,806  7,842  18 
Film  27,312  122,706  1,292  4 
Airtraffic networks [
We compare REQF with classic and advanced rolebased network embedding methods including Role2vec [
For the REQF, we set the final parameters in the parameter sensitivity experiment. We set quantum walk evolution times
We conduct the task of rolebased node classification on seven realworld networks. In this experiment, we input the embedding into a linear logistic regression classifier, 70% of the data is randomly selected for training, and the rest is used for testing. We run all the methods 20 times, and compute their average F1 score to measure the accuracy of the classification, and the area under the curve (AUC) score to measure the quality of it. The results are shown in
Datasets  Role2vec  FEATHER  RED  RESD  REQF 

Brazil flights  0.348 ± 0.069  0.436 ± 0.062  0.226 ± 0.073  0.742 ± 0.062  
Europe flights  0.303 ± 0.047  0.508 ± 0.045  0.350 ± 0.022  0.478 ± 0.056  
USA flights  0.409 ± 0.033  0.564 ± 0.022  0.268 ± 0.021  0.556 ± 0.013  
Cora  0.752 ± 0.016  0.113 ± 0.017  0.145 ± 0.014  0.581 ± 0.012  
Actor  0.307 ± 0.011  0.401 ± 0.006  0.225 ± 0.018  0.459 ± 0.014  
LastFM Asia  0.711 ± 0.009  0.112 ± 0.001  0.043 ± 0.003  0.563 ± 0.006  
Film  0.308 ± 0.005  0.356 ± 0.005  0.437 ± 0.005  0.433 ± 0.004 
Datasets  Role2vec  FEATHER  RED  RESD  REQF 

Brazil flights  0.365 ± 0.076  0.448 ± 0.056  0.279 ± 0.095  0.745 ± 0.057  
Europe flights  0.304 ± 0.044  0.524 ± 0.050  0.415 ± 0.024  0.488 ± 0.049  
USA flights  0.412 ± 0.031  0.568 ± 0.019  0.297 ± 0.019  0.564 ± 0.015  
Cora  0.760 ± 0.013  0.142 ± 0.016  0.168 ± 0.012  0.633 ± 0.014  
Actor  0.315 ± 0.012  0.424 ± 0.008  0.250 ± 0.008  0.478 ± 0.014  
LastFM Asia  0.809 ± 0.005  0.135 ± 0.002  0.043 ± 0.004  0.716 ± 0.005  
Film  0.346 ± 0.005  0.417 ± 0.007  0.757 ± 0.004  0.737 ± 0.006 
Datasets  Role2vec  FEATHER  RED  RESD  REQF 

Brazil flights  0.577 ± 0.050  0.632 ± 0.038  0.519 ± 0.063  0.830 ± 0.038  
Europe flights  0.536 ± 0.029  0.683 ± 0.031  0.610 ± 0.016  0.659 ± 0.033  
USA flights  0.608 ± 0.021  0.702 ± 0.013  0.531 ± 0.013  0.709 ± 0.010  
Cora  0.860 ± 0.008  0.450 ± 0.009  0.514 ± 0.007  0.828 ± 0.008  
Actor  0.543 ± 0.008  0.616 ± 0.005  0.450 ± 0.005  0.652 ± 0.009  
LastFM Asia  0.854 ± 0.003  0.496 ± 0.001  0.036 ± 0.002  0.754 ± 0.003  
Film  0.564 ± 0.004  0.611 ± 0.005  0.838 ± 0.003  0.825 ± 0.004 
The results of role classification indicate that the proposed REQF has achieved the best performance in five networks, with 14.6% higher than the best baseline method. However, the performance of REQF on Cora and LastFM Asia networks is lower than FEATHER and Role2Vec separately. The reason may be that the users of these two social networks are mostly mutual strangers, with relatively equal interconnection and less obvious role characteristics, resulting in poor performance in rolebased classification. It can be proved that both datasets also show low performance in the RED and RESD methods that are based on role embedding. So, the results demonstrate the effectiveness of REQF in the role classification task.
Role detection is one of the important tasks of rolebased network embedding. It clusters the embedding into different existing classes to represent different roles in the real network. The experiment is conducted to Kmeans clustering method using Brazil fights dataset to observe the performances.
We evaluate the clustering extensively using four popular metrics: Adjusted Mutual Information (AMI) [
The results are shown in
Algorithm  AMI  ARI  Vmeasure  Sil 

Role2vec  0.0403  0.0236  0.0659  0.0634 
FEATHER  0.1284  0.0859  0.1518  0.3311 
RED  0.0284  0.0242  0.0557  0.4981 
RESD  0.4668  0.3953  0.4809  0.2489 
REQF  0.3509  0.2062  0.3728  0.5559 
REQF 
Compared with the best baseline, REQF is improved by 16.5% in AMI, 38.9% in ARI, 15.5% in Vmeasure, and 24.1% in Sil. An intuitive comparison is shown in
For parameters of REQF, we conduct experiments to analyze the sensitivity of all parameters on the Brazil flights network.
To explore the influence of quantum walk, CF, and VAE modules on the REQF, we compare the performance between REQF and REQF without the quantum walk (REF), feature fusion (REQ), and VAE (REQF). The results are shown in
It is obvious that REQF significantly outperforms the other models, showing the best performance. REF is secondbest, while it is still inferior to REQF, indicating the necessity of global role information and the effectiveness of the quantum walk module in capturing it. The results of REQFand REF are close, which proves that noise reduction of VAE plays an important role same as the quantum walk. REQ has the worst results, especially the Sil metric, which is significantly lower than all the others. It indicates that the CF greatly affects the performance of REQF, especially in the clustering tasks, and it is necessary to consider the different distributions of node features and neighborhoods. Hence, all modules indeed play an important role in REQF, and all of them effectively improve the overall performance of REQF.
In order to visualize the effect of role clustering, we compare REQF with baselines for visualization on the Brazil flights dataset due to its proper size and uniform classes of node roles. We use tSNE [
As shown in
In this paper, we propose REQF to generate rolebased network embedding via quantum walk and its weighted feature fusion, which simultaneously considers node role information, node features, and noise. REQF utilizes quantum walk to capture the global and local role information of nodes, leverage its weighted characteristic function for feature fusion, and finally use VAE to reduce the effect of noise. The experimental results demonstrate the effectiveness and stability of the REQF on realworld datasets for downstream tasks. We also demonstrate the importance of each module and explore optimal values of parameters. For the limitation of our work, on the one hand, we only use a simple multilayer perceptron, which may not be able to reduce the effect of noise sufficiently. On the other hand, our proposed method only focuses on homogeneous networks, and it cannot be expanded to heterogeneous graph networks. For future work, we can try a more efficient deeplearning framework to obtain a better representation of the rolebased network. We can also expand to explore the field of heterogeneous rolebased graph networks.
This work was supported in part by the National Nature Science Foundation of China (Grant 62172065) and the Natural Science Foundation of Chongqing (Grant cstc2020jcyjmsxmX0137).
The authors declare that they have no conflicts of interest to report regarding the present study.