Graph Neural Networks (GNNs) play a significant role in tasks related to homophilic graphs. Traditional GNNs, based on the assumption of homophily, employ low-pass filters for neighboring nodes to achieve information aggregation and embedding. However, in heterophilic graphs, nodes from different categories often establish connections, while nodes of the same category are located further apart in the graph topology. This characteristic poses challenges to traditional GNNs, leading to issues of “distant node modeling deficiency” and “failure of the homophily assumption”. In response, this paper introduces the Spatial-Frequency domain Adaptive Heterophilic Graph Neural Networks (SFA-HGNN), which integrates adaptive embedding mechanisms for both spatial and frequency domains to address the aforementioned issues. Specifically, for the first problem, we propose the “Distant Spatial Embedding Module”, aiming to select and aggregate distant nodes through high-order random walk transition probabilities to enhance modeling capabilities. For the second issue, we design the “Proximal Frequency Domain Embedding Module”, constructing adaptive filters to separate high and low-frequency signals of nodes, and introduce frequency-domain guided attention mechanisms to fuse the relevant information, thereby reducing the noise introduced by the failure of the homophily assumption. We deploy the SFA-HGNN on six publicly available heterophilic networks, achieving state-of-the-art results in four of them. Furthermore, we elaborate on the hyperparameter selection mechanism and validate the performance of each module through experimentation, demonstrating a positive correlation between “node structural similarity”, “node attribute vector similarity”, and “node homophily” in heterophilic networks.

Traditional Graph Neural Networks (GNNs) [

However, the opposite is true in heterophilic networks, where most nodes tend to connect with nodes of different classes and lower similarity in attribute vectors [

To address the above issues simultaneously, this paper proposes the SFA-HGNN model. First, to tackle the “distant node modeling deficiency”, we introduce the concept of structural similarity for distant nodes during the structural encoding stage by high-order random walks originating from each node. It can help identify highly homophilic distant nodes. We establish direct connections between the central node and these distant homophilic nodes to facilitate the potential discovery of neighborhoods. Thereby we obtain the results of spatially adaptive embedding via attention mechanisms to integrate the distant node information. Second, to address the “failure of the homophily assumption”, we design an adaptive filter that amplifies differences between nodes using high-pass filtering and preserves common features using low-pass filtering [

Specifically, the main contributions of this paper are as following:

The SFA-HGNN model addresses the challenges of the “distant node modeling deficiency” and the “failure of the homophily assumption” commonly faced by traditional GNNs through the distant spatial embedding module and the proximal frequency embedding module.

SFA-HGNN has deployed in six common heterophilic networks, achieving state-of-the-art results in four of them. This validates the effectiveness of the proposed model design. Besides, the paper thoroughly discusses the selection mechanism for the hyperparameter set through theory and experiments.

The paper provides experimental evidence for the positive correlation among “node structural similarity”, “node attribute vector similarity”, and “node homophily”, and demonstrates the advantages of the constructed distant homophilic subgraph in enhancing neighborhood homophily and attribute vector similarity. The paper also proves the advantages of the frequency-directed attention mechanism in adaptive learning of high-frequency and low-frequency signals in proximal nodes.

In the context of the homophily graph message-passing framework, neighboring nodes are typically defined as those reachable from the center node within one hop [

GNN Architecture Refinement is a redesign of the AGGREGATE and UPDATE modules in the traditional message passing framework [

The structural role refers to the structural relationship exhibited by nodes and their neighborhoods in the original graph topology [

Currently, graph neural network models that rely solely on message passing mechanisms can aggregate neighborhood node information based on the original graph topology [

As shown in

In conclusion, it is shown that using simple message passing within n orders alone cannot accurately represent nodes that have the same n-order Nodes Embedding Tree but possess different structural information. Entities S1 and S2 play different structural roles within their respective connected components. S2 acts as a hub node, taking on the task of connecting the other seven nodes. However, in traditional MP-GNN, nodes with similar first-order structures are assigned the same node representation during one round of message passing. This can lead to confusion about the roles these nodes play in the graph structure, making it difficult to effectively model the rich information embedded in structural roles.

Recent studies have shown that supplementing MP-GNN with deterministic distance attributes as structural role information can effectively compensate for the shortcomings of traditional graph neural network models in describing node structural roles. DE-GNN [

In their research, Dai et al. addressed the issue of noisy edges and limited node labels and proposed the RS-GNN model [

However, RS-GNN also has certain limitations. The model assumes that “nodes are more likely to connect with similar nodes,” which serves as the basis for training the Link Predictor based on the adjacency matrix and edge reconstruction task, leading to a stronger emphasis on connecting similar node pairs. However, this assumption does not hold completely in Heterophilic graphs, giving rise to the phenomenon of “failure of homophily assumption”. If we directly train the Link Predictor based on the adjacency matrix of a Heterophilic graph (which exhibits nodes that are more likely to connect with different types of nodes), it may have a negative impact on self-supervised learning. This is because the adjacency matrix of a heterophilic graph inherently contains more heterophilic noise compared to a normal dataset, and constructing a Pretext Task directly based on this may affect the training of the Link Predictor. In the following sections, we will address the issue of heterophilic noise from a perspective more suitable for highly heterophilic graph data.

Relevant work has shown that the relationship between node labels and graph structure can serve as a metric for graph homophily [

In above equations,

The range of values for the above metrics is

The SFA-HGNN model is structured as follows, as shown in

While the

The structural encoding mechanism focuses on embedding the topological attributes of all nodes within a connected component relative to the central node. Simultaneously, it uses high-order random walk transition probabilities originating from the central node to selectively identify highly homophilic distant nodes. A virtual high-speed link is established between these selected nodes and the central node, creating a direct connected subgraph. Within this subgraph, message passing based on attention mechanisms enables the fusion of both topological roles and distant homophily information resulting in spatial embeddings.

If only first-order neighborhood nodes are used for message passing between S1 and S2, they would yield the same node embedding results. However, by encoding structural information into the attribute vectors, such as using the shortest path length (SE) as an example, where the color of nodes reflects the distance from the central node based on the shortest path length, it becomes possible to express the structural role of the central node based on the topological information provided by background nodes. This encoding of structural information allows for node classification and differentiation.

In this paper, the shortest path length and random walk transition probability are used to describe the node role structure. The specific definition is as follows:

The above node

As shown above, _{sp} characterizes the shortest path distance from node _{rw} calculates the higher-order random walk migration probability between nodes

(1) According to the definition of

(2) According to the definition of random walk, it can be seen that the low-order random walk is restricted by the distribution of proximal nodes, more inclined to assign higher migration probability to the proximal nodes, there is a numerical instability phenomenon, and the process of the walk is not converged to take care of the distal nodes and the lack of global information description.

At the same time, when

As shown in _{seeds} to be distal nodes. The _{k} originating from _{seeds} is then employed to gauge its potential similarity with these distal nodes. The definition is outlined as follows:

_{k} represents the migration probability vector of _{seeds} with respect to the other nodes after

The algorithm aggregates migration probabilities only for the higher-order random walk segment, specifically within the range of

_{far} for neighboring nodes based on the structural similarity _{seeds} is necessary to construct a distal homophilic subgraph denoted as

MLP refers to a single-layer feedforward neural network. It takes the linear transformation results of the feature vectors of nodes _{ij}. Subsequently, applying _{ij} between the nodes. This process facilitates information fusion within the distal homophilic subgraph based on an attention mechanism. To summarize, the message passing enables the spatial embedding results, denoted as

Highly heterophilic nodes often have direct neighbors in different categories. However, graph neural networks based on the homophilic assumption lead to node representations being smoothed by the heterophilic information from neighboring nodes through low-pass filtering, thereby reducing the discriminative power of node representations. To address this issue, it is crucial to design high-pass filters that capture the differences between node representations and the heterophilic information from their neighborhoods and design low-pass filters ensure that common information among neighboring nodes is adequately learned.

In this study, an adaptive filter is defined to separate the low and high-frequency components contained in node features. Leveraging the prior information of similarity between attribute vectors of distal nodes, a frequency-domain guided attention mechanism is introduced to learn how to aggregate high-frequency and low-frequency signals in the graph, achieving adaptive message aggregation. Thereby we can obtain the result of the frequency domain adaptive embedding of the proximal node.

As shown in the above illustration, when

According to Fourier transform theory, in the spatial domain, the convolution of a filter with a signal, denoted as

Therefore, by substituting the defined high-frequency and low-frequency convolutional kernels from this paper into the above equation, we can derive the forms of the adaptive filter for high-frequency and low-frequency components as follows:

According to the above equation, the low-pass filter designed in this study essentially aggregates node and neighborhood features in a specific proportion, leading to a gradual convergence of node representations. On the other hand, the high-pass filter amplifies the differences between nodes and their neighborhoods, resulting in high-frequency representations that differ from the neighborhood. Both the low-pass and high-pass filters are collectively defined as the adaptive filter in this study. They operate on node features, amplifying either the commonality or distinctiveness between nodes and their neighborhoods, thus capturing intrinsic high-frequency and low-frequency information in node representations.

Defining the node features as

To ensure that

In this study, the average attribute vector similarity

In heterophilic graph, when node’s

As shown above, the concatenation operation

The described equation shows the weight parameters _{1} and _{2}, where _{l} represents the dimension of the hidden layer, and

The study concatenates the proximal frequency-domain embedding

In this paper, we deploy the SFA-HGNN on six publicly available heterogeneous graph datasets, which are described as shown in

Cornell | Wisconsin | Texas | Film | Chameleon | Squirrel | |
---|---|---|---|---|---|---|

Nodes | 183 | 251 | 183 | 7600 | 2277 | 5201 |

Edges | 295 | 499 | 309 | 33544 | 36101 | 217073 |

Features | 1703 | 1703 | 1703 | 931 | 2325 | 2089 |

Classes | 5 | 5 | 5 | 5 | 5 | 5 |

_{edge} |
0.30 | 0.21 | 0.11 | 0.22 | 0.23 | 0.22 |

_{node} |
0.11 | 0.16 | 0.06 | 0.24 | 0.25 | 0.22 |

The

The

In this paper, we refer to the framework outlined by Zheng et al. [

Firstly, the concept of

Secondly,

Thirdly, in this paper, we select frequency-domain and spatial-domain graph neural networks based on the homophily assumption to compare the effectiveness of our proposed model in heterophilic graphs. The specific models are as follows:

Fourthly,

The remaining baseline models were configured according to the parameter settings used by these papers [

The software environments used for the experiments in this paper are Pytorch, Pytorch Geometric, and Python 3.8. The hardware environments used are GPU RTX 3090 (2 GB); CPU Intel(R) Xeon(R) Gold 6330 @ 2.00 GHz; and RAM 80 GB.

The experimental results of the model on the above six datasets are shown in

Core idea | Models | Cornell | Wisconsin | Texas |
---|---|---|---|---|

Spatial-frequency | ||||

High-order NeighborMixing | 73.51 ± 6.34 | 75.88 ± 4.90 | 77.84 ± 7.73 | |

82.16 ± 4.80 | 86.67 ± 4.69 | 84.86 ± 6.77 | ||

Potential |
60.81 | 64.12 | 67.57 | |

58.7 ± 6.8 | 60.3 ± 7.0 | 63.7 ± 6.1 | ||

85.14 ± 6.00 | 86.86 ± 2.62 | 85.23 ± 6.40 | ||

GNN architecture refinement | 81.62 ± 3.90 | 86.98 ± 3.78 | 83.62 ± 5.50 | |

88.03 ± 5.6 | 89.75 ± 6.37 | 88.85 ± 4.39 | ||

74.43 ± 10.24 | 69.50 ± 3.12 | 75.41 ± 7.18 | ||

66.56 ± 13.82 | 62.50 ± 15.75 | 80.66 ± 1.91 | ||

Homophily |
58.91 ± 8.33 | 58.82 ± 6.06 | 59.73 ± 3.24 | |

56.76 ± 5.70 | 57.06 ± 7.07 | 59.45 ± 6.37 | ||

75.95 ± 5.01 | 81.18 ± 5.56 | 82.43 ± 6.14 | ||

70.98 ± 8.39 | 70.38 ± 2.85 | 83.28 ± 5.43 | ||

Node attributes | 82.16 ± 7.45 | 85.49 ± 4.99 | 81.08 ± 3.82 |

As shown in

Models | Film | Chameleon | Squirrel | Average |
---|---|---|---|---|

67.52 ± 1.20 | 53.60 ± 1.80 | |||

32.22 ± 2.34 | 60.50 ± 2.53 | 43.80 ± 1.48 | 60.58 | |

35.86 ± 1.03 | 59.39 ± 1.98 | 37.90 ± 2.02 | 64.47 | |

31.63 | 60.9 | 38.14 | 53.86 | |

31.4 ± 1.0 | 69.4 ± 1.6 | 58.8 ± 1.4 | 57.05 | |

37.08 ± 1.41 | 70.78 | |||

36.53 ± 0.77 | 65.24 ± 0.87 | 48.85 ± 0.78 | 67.14 | |

31.59 ± 1.37 | 49.47 ± 2.84 | 42.24 ± 1.20 | 64.99 | |

35.41 ± 0.97 | 68.14 ± 1.18 | 52.28 ± 3.61 | 62.53 | |

32.72 ± 2.62 | 64.68 ± 2.85 | 53.40 ± 1.90 | 60.09 | |

30.16 ± 1.27 | 65.92 ± 2.58 | 49.78 ± 2.06 | 53.89 | |

29.74 ± 1.46 | 65.32 ± 2.00 | 46.79 ± 2.08 | 52.52 | |

34.23 ± 0.99 | 58.73 ± 1.68 | 41.61 ± 0.74 | 62.36 | |

25.26 ± 1.18 | 64.86 ± 1.81 | 47.62 ± 1.27 | 60.40 | |

35.79 ± 1.09 | 47.36 ± 2.37 | 29.82 ± 1.99 | 60.28 |

SFA-HGNN integrates the core ideas of Adaptive Message Aggregation and Potential Neighbor Discovery, and designs adaptive filter and oriented frequency domain attention mechanism to introduce frequency domain adaptivity for Heterophilic message passing; At the same time, we design structural coding and distal homophilic subgraph sampling to embed the rich structural information represented by random wandering migration probability for the nodes, and use it as a guide to mine the distal high homophilic nodes as a supplement to the embedding, so as to obtain the node embedding results with both spatial-frequency domain adaptivity. And the model mitigates the effects of noise and over-smoothing introduced by the High-order Neighbor Mixing model's unfiltered aggregation of distal nodes; compared with the Potential Neighbor Discovery model, it fully combines the structural role information of nodes, and avoids the problem of separating the classification of nodes from the topology of the graph.

According to the experimental results, SFA-HGNN achieves SOTA results in Cor\Wis\Tex\Film; compared with the GCN with

Compared with the

Comparing with

Compared with

Meanwhile, GPNN achieves excellent experimental results in the Chameleon and Squirrel datasets because the _{node} of the two datasets are 0.25 and 0.22, respectively, compared with the rest of the datasets, they have slightly higher homophily and relatively dense edges;

GPNN samples the initial neighborhood nodes and generates the initial node sequences through the BFS algorithm, and then carries out further learning on the node sequences, so the mechanism will tend to learn the proximal node sequences in dense graphs, and the above two datasets can provide relatively rich proximal isomorphism information and dense edges for GPNN to train the pointer network, thus achieving better experimental results in the above datasets.

To demonstrate the effectiveness of the node embedding and classification in this study, we performed t-SNE visualization analysis on the node embedding results of the Wisconsin and Texas datasets after 100 training epochs. The following figure,

As shown in the

This section sets the core hyperparameter value range in combination with prior information, defined as follows:

Effective information transmission radius

Distal node threshold

Distal node sampling ratio (

In summary, the relevant experiments are carried out by taking the validation set nodes defined in the previous section, and the results of the above a priori information calculations are shown below:

As shown in

As shown in

As shown in

The overarching principle for setting the range of “

Similarly, using the distal node threshold “

As shown in

This section aims to integrate the aforementioned prior information with the intrinsic characteristics of the dataset. We design four sets of hyperparameters for each dataset, adjusting the information density of proximal and distal node connections. These sets are specifically tailored to embed information for nodes at close, intermediate, middle-distance, and far distances. The specific parameter designs are shown in

Parameters | Sample_account | |||
---|---|---|---|---|

Cha_near | 4 | 3 | 1 | 4% |

Cha_middle | 4 | 3 | 2 | 4% |

Cha_middle_long | 4 | 2 | 2 | 2% |

Cha_long | 4 | 2 | 2 | 5% |

Squ_near | 3 | 2 | 1 | 1.5% |

Squ_middle | 3 | 2 | 2 | 1.5% |

Squ_middle_long | 5 | 3 | 2 | 2% |

Squ_long | 5 | 3 | 3 | 2% |

Film_near | 3 | 2 | 2 | 5% |

Film_middle | 4 | 3 | 2 | 1.5% |

Film_middle_long | 4 | 3 | 3 | 1.5% |

Film_long | 6 | 3 | 2 | 1% |

Parameters | Sample_account | |||
---|---|---|---|---|

Cornell_near | 5 | 3 | 2 | 25% |

Cornell_middle | 5 | 4 | 3 | 25% |

Cornell_middle_long | 5 | 3 | 3 | 25% |

Cornell_long | 5 | 2 | 2 | 10% |

Texas_near | 5 | 3 | 3 | 20% |

Texas_middle | 5 | 3 | 2 | 20% |

Texas_middle_long | 5 | 2 | 3 | 10% |

Texas_long | 5 | 2 | 2 | 10% |

Wis_near | 5 | 3 | 3 | 20% |

Wis_middle | 5 | 3 | 2 | 20% |

Wis_middle_long | 5 | 2 | 3 | 20% |

Wis_long | 5 | 2 | 2 | 20% |

The effective information propagation radius

Conversely, the proximal node threshold

Based on the aforementioned combination of parameters, the experimental results are shown in

As shown in

As shown in

The aim of this section is to adapt the information fusion method in the message aggregation process by modifying

As described above, a linear transformation is applied to the concatenated vectors of dimension 2F using a feedforward neural network

For the experiments conducted on the aforementioned datasets, the optimal parameter set defined in

As shown in

On the other hand,

To validate the effectiveness of the spatial domain embedding module in capturing distal node homophily information, a comparison is conducted among three sets of neighboring nodes: first-order neighborhood nodes, the top 5 neighborhood nodes sampled based on attention scores from the Node2Seq model, and distal nodes selected using random walk transition probabilities. The relative homophily measure, _{node}, with respect to the central node is computed for these sets. Utilizing the optimal hyperparameters designed in _{node}.

Moreover, Zhu et al. [_{twohop}, as a baseline. Additionally, the attribute vector similarity, _{far}, between the central node and distal nodes is calculated. By comparing these two measures, the model's preferred nodes are validated to possess not only homophily but also attribute vector similarity. The results of the two aforementioned experiments are presented in

As depicted in

Based on the earlier discussion, second-order neighbors predominantly exhibit central node homophily information. Therefore, it serves as the baseline for highlight the relationship between the feature vectors of the selected distal nodes and the central node. As shown in

Simultaneously, adhering to the optimal parameter settings defined in _{far} and distal neighborhood _{far}, for various

As mentioned earlier, _{far} and _{far} decrease as

This paper introduces the SFA-HGNN model to address the challenges that traditional GNNs face when applied to heterophilic graphs, specifically the issues of “missing modeling of distal nodes” and “failure of homophily assumption”. To tackle the former, SFA-HGNN employs a “distal spatial embedding module” based on higher-order random walk transition probabilities to sample and aggregate information from distal nodes with high structural similarity, thereby enhancing the model’s ability to capture distal node characteristics. To address the latter, the “proximal frequency domain embedding module” is designed to adaptively learn high and low-frequency signals from proximal nodes to fuse valuable information, reducing noise interference introduced by the failure of the homophily assumption on the low-pass filters.

The paper concludes by demonstrating the excellent performance of SFA-HGNN in heterophilic network node classification tasks, explaining the theoretical mechanisms behind hyperparameter selection and the effectiveness of each module. The positive correlation among node attribute vector similarity, node homophily, and node structural similarity is validated through experiments. However, the model still has room for improvement. For instance, the structural encoding process essentially is pre-embedding node information, and its time complexity is closely related to the complexity of network nodes and edges. Future work could focus on further improving the model in response to this challenge.

We are grateful to the editors and reviewers for their helpful suggestions, which have greatly improved this paper.

This work is supported by the Fundamental Research Funds for the Central Universities (Grant No. 2022JKF02039).

The authors confirm contribution to the paper as follows: study conception and design: Lanze Zhang, Yijun Gu; data collection: Lanze Zhang, Jingjie Peng; analysis and interpretation of results: Lanze Zhang, Jingjie Peng; draft manuscript preparation: Lanze Zhang, Jingjie Peng. All authors reviewed the results and approved the final version of the manuscript.

If the SFA-HGNN model and relevant experimental data are needed, readers are advised to contact the author, and the author will provide you with help in the first place.

The authors declare that they have no conflicts of interest to report regarding the present study.