Spatio-temporal heterogeneous data is the database for decision-making in many fields, and checking its accuracy can provide data support for making decisions. Due to the randomness, complexity, global and local correlation of spatiotemporal heterogeneous data in the temporal and spatial dimensions, traditional detection methods can not guarantee both detection speed and accuracy. Therefore, this article proposes a method for detecting the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks. Firstly, the geographic weighting function is introduced and improved to quantify the degree of association between nodes and calculate the weighted adjacency value to simplify the complex topology. Secondly, design spatiotemporal convolutional units based on graph convolutional neural networks and temporal convolutional networks to improve detection speed and accuracy. Finally, the proposed method is compared with three methods, ARIMA, T-GCN, and STGCN, in real scenarios to verify its effectiveness in terms of detection speed, detection accuracy and stability. The experimental results show that the RMSE, MAE, and MAPE of this method are the smallest in the cases of simple connectivity and complex connectivity degree, which are 13.82/12.08, 2.77/2.41, and 16.70/14.73, respectively. Also, it detects the shortest time of 672.31/887.36, respectively. In addition, the evaluation results are the same under different time periods of processing and complex topology environment, which indicates that the detection accuracy of this method is the highest and has good research value and application prospects.

Spatio-temporal heterogeneity data (STD) [

With the rapid development of artificial intelligence, deep learning technology has roared in the field of data quality control with its powerful spatiotemporal data analysis capability, the key techniques include graph convolutional networks and temporal convolutional networks. Based on graph theory, Graph Convolutional Networks (GCN) [

Therefore, this article proposes a method to detect the accuracy of spatiotemporal heterogeneous data by fusing graph convolution and temporal convolution networks, based on the hybrid intelligent algorithm design idea of “divide and conquer, complementary advantages” [

The innovations in this article include:

1. This article fuses GCN and TCN to solve the problem of feature extraction and analysis for spatiotemporal heterogeneous data.

2. This article proposes an adaptive geographically weighted function for enhancing the saliency of model spatial feature extraction. At the same time, the activation function of the temporal convolutional network is optimized to avoid the overfitting of the network, which in turn enhances the generalization ability of the network.

The method proposed in this article has a good application effect in many real-world scenarios. For example, while checking data quality with its data interchange platform, the Research Institute of an Oilfield Co., Ltd., chose this method to check data accuracy. In the actual application, this method has a faster run time, and the accuracy is superior to the original method. Thus, it saves considerable time and economic cost for the scientific research institution and greatly promotes the information management of the oilfield.

The structure of this article is shown below. Section 2 gives the application scenarios, describes the basic concept and formula, and generalizes the key problems. Section 3 expounds on how to deal with complex topology structures. Section 4 elaborates on the operation mechanism and improvement points of FAGTN. Section 5 verifies the validity of the method. Section 6 summarizes the study and prospects for future work.

This section gives the definition of the scenario and the basic concepts and described the key problems to be solved.

Define scenario

Since the given task is advanced detection of data, the node attributes in scenario

The basic concepts, formulas, and reasoning processes can be described as follows.

This article randomly selects a detection space

The “optimal” spatiotemporal data detection model means as much as possible to meet the shortest time and best results. Shortest time means the shortest possible detection time and can be expressed as

Best results mean “the most accurate possible detection result”, which is reflected in giving accurate nodes’ attribute reference values at the current moment.

In the end, the data accuracy detection problem can be described as the optimizing problem with

This article collectively refers to node regions with prominent spatiotemporal heterogeneity as complex topology structure regions, and the more obvious such structural features are, the lower model processing speed and accuracy will be. Therefore, this article can preprocess the complex topology structure to improve the detection speed and detection accuracy.

In summary, the problems solved in this article can be described below.

Based on the definition of complex topology structure in the 2.2 problem description, this article introduces the concept of a graph to model the complex topology structure and uses the graph structure to describe the complex spatiotemporal relationships between nodes. The problem of “how to deal with complex topology structure” is transformed into “how to reflect the strength of node adjacency relationships in graph structure”.

To reflect the spatiotemporal heterogeneity of the data, this article designed a time-varying graph layer group with several depths, as shown in

According to the 2.1 scenario description, this article defines the graph structure

After modeling the scenario as a graph structure, this article calculates the weight coefficient among the nodes to achieve the operation of “weighting” [

This article usually uses the adjacency matrix to represent the connectivity between nodes. This article defines that

Formula

In the geographic weighting process, this article refers to the detection radius as the bandwidth, which is a trainable parameter whose magnitude affects the function’s slope. The time-varying layer group we built contains several detection spaces with different node locations and connectivity, which makes it impossible to use the same bandwidth value to calculate the weights.

Therefore, this article proposes an adaptive mechanism to determine the bandwidth value, which follows the principle of full coverage of nodes. In each detection space, the center node is taken as the circle center, and the farthest node distance is taken as the radius to form the weight calculation range. Assuming

This article defines the coordinates of the node as

According to the calculation rule for the weight coefficient, this article gives a processing method similar to “weight pruning”. The process is described as follows.

The processed complex topology structure can better reflect the strength of the adjacency relationship between nodes. Taking

STD’s accuracy detection is influenced by time series and entity space features. This article constructs the spatiotemporal feature analysis model and gives the model design idea and model framework.

GCN’s core is to define convolution operations on graphs possessing complex topology structures to achieve spatial feature extraction. TCN’s core lies in performing convolutional operations on temporal data to learn temporal features. Based on this, this article fused GCN and TCN (named FAGTN) and made improvements, which are described below.

Unlike conventional image convolution operations, graphs with complex topology structures cannot be convolved in the spatial domain by conventional methods. Therefore, the concept of Fourier transform is introduced to transform the graph from the spatial domain view to the frequency domain view for processing. The scaling operation is performed on each dimension, and the adjacent nodes are aggregated to complete the convolution operation, and finally return to the Spatial domain. The transformation process is shown in formula

This article introduces and improves TCN, which solves the problem of learning temporal features. A one-dimensional fully convolutional network structure (FCN) is adopted to ensure the same length between layers by zero padding. Dilated causal convolutions are added to achieve exponential expansion of the receptive field. At the same time, it also ensures that the output at a certain time is only convolved with elements at that time and earlier. When training a deeper network structure, the residual connection structure is used to transfer information across layers. This article selects the PReLU activation function to improve the residual module and enhance the ability that the model to learn effective temporal features by training the learnable parameters

The structure of FAGTN (spatiotemporal Feature Analysis Model Fused by GCN and TCN) is shown in

The pseudo-code form of FAGTN is shown in Algorithm 1, wherein, “/**/” indicates the annotation.

This article uses three evaluation metrics to evaluate the accuracy of FAGTN. They are the root mean square error (RMSE) [

In this part, this article verifies the effectiveness of the spatiotemporal heterogeneity data accuracy detection method proposed in this article by comparing the performance indicators of a similar model. A brief description of the experimental design is shown below.

The oilfield data is summarized once a month. This article selected the data between 2008 and 2018, chose the first ten years as the training set, and the rest were used as the validation set and test set respectively. This article represented multiple attribute parameters of the data as different detection tasks, and wrote each data item as

This article compared performance indicators of FAGTN, ARIMA, T-GCN, and STGCN in different levels of connectivity and different time periods. The performance indicators include the average detection time and the model accuracy. The aim is to verify that FAGTN has obvious advantages in terms of detection speed, detection accuracy, and stability.

Parameter | Initial learning rate | Batch size | Number of iterations | ||
---|---|---|---|---|---|

Value | 0.001 | 32 | 2 | 0.25 | 50 |

This article tested the average detection time and the model accuracy obtained by the four models in different degrees of connectivity. This aims to observe the influence of the complexity of the connectivity between nodes on the detection speed and accuracy of the models.

In the performance experiment with different degrees of connectivity, this article divided the training set into two parts and defined the node set that satisfies

Model | Different degrees of connectivity (simple connected area/complex connected area) | |||
---|---|---|---|---|

MAE | MAPE (%) | RMSE | Detection time (s) | |

ARIMA | 23.80/21.16 | 4.75/4.34 | 22.98/20.38 | 15878.62/16692.85 |

T-GCN | 17.91/15.79 | 3.58/3.16 | 21.04/19.13 | 981.35/1013.54 |

STGCN | 15.25/13.03 | 3.05/2.61 | 18.11/15.81 | 776.12/970.03 |

FAGTN | 13.82/ |
2.77/ |
16.70/ |

By analyzing the experimental results, this article obtained the following conclusions.

(1) The traditional time series model (ARIMA) performed poorly in the experiment. When dealing with STD, this model only considered the temporal characteristics of the data and ignored the spatial characteristics of the data, which is a poor fit for data with prominent spatiotemporal heterogeneity, so the accuracy of the model is lower.

(2) As shown in

(3) When dealing with different degrees of connectivity, the four models all showed better performance in complex connected areas. It indicated that complex connected areas can adapt to deeper spatial feature mining.

This article tested the model accuracy obtained by the four models in different periods. This aims to observe the influence of the historical time series length on the detection accuracy of the models.

In the performance experiment with different periods, this article selected all nodes to participate in the experiment. To highlight the influence of the length of the period on the model performance, this article divided the training set into 10-time units according to the year, and each time unit contained 12-time points corresponding to the 12 months of each year. This article designed a self-increasing time series

Period | The number of time unit | Time point details | Period | The number of time unit | Time point details |
---|---|---|---|---|---|

12 | 201701–201712 | 72 | 201201–201712 | ||

24 | 201601–201712 | 84 | 201101–201712 | ||

36 | 201501–201712 | 96 | 201001–201712 | ||

48 | 201401–201712 | 108 | 200901–201712 | ||

60 | 201301–201712 | 120 | 200801–201712 |

Comparing the experimental performance of the four models, the experimental results are shown in

By analyzing the experimental results, this article obtained the following conclusions.

(1) As shown in

(2) Compared with the STGCN which has better performance, FAGTN’s performance always has obvious advantages in different periods. This article found that when the historical time series increases, the accuracy of FAGTN is always the highest, indicating that FAGTN maintains its advantages in the processing of long historical time series.

This article selected the STGCN which has better performance as the comparison model, and tested the models’ accuracy in different detection tasks and the different numbers of experiments. This aims to analyze the models’ stability. This article took the types of detection tasks and the number of experiments as variables and used the control variable method to test the changes in the accuracy of the model.

This article performed 5 groups of experiments for different detection tasks, selected different data attributes as detection tasks, and repeated each group of experiments 10 times. The experimental result is the average of all experimental results.

This article performed 50 groups of experiments for the different numbers of experiments added an experiment as a new group each time (the first group had one experiment), and took the average of the experimental results of each group.

As a supplementary instruction,

Experimental project | Indicators | STGCN | FAGTN |
---|---|---|---|

Accuracy interval | [0.9346, 0.9458] | [0.965, 0.9694] | |

Different detection tasks | Maximum difference (%) | 1.12 | 0.44 |

Minimum difference (%) | 0.04 | 0.01 | |

Standard deviation | 0.0044 | ||

Accuracy interval | [0.9391, 0.9508] | [0.9651, 0.9693] | |

Different number of | Maximum difference (%) | 1.17 | 0.42 |

experiments | Minimum difference (%) | 0 | 0 |

Standard deviation | 0.0032 |

By analyzing the experimental results of the model’s stability, this article obtained the following conclusions.

(1) As shown in

(2) As shown in

This article compared and analyzed the following indicators before and after handling complex topology structures.

This article designed this experiment to analyze the effect before and after handling complex topology structures on detection accuracy and detection speed. This article selected the number of nodes as a variable (the nodes in the experiment can form multiple detection spaces) and compared the models with different numbers of nodes. This article performed 100 groups of experiments for accuracy added 10 nodes as a new group each time (the first group has 100 nodes), and took the average of the experimental results of each group. This article performed 5 groups of experiments for detection speed and added 50 nodes as a new group each time (the first group had 100 nodes). The experimental results are shown in

By analyzing the experimental results, this article obtained the following conclusions.

(1) As shown in

(2) As shown in

This article proposed a spatiotemporal heterogeneity data accuracy detection method by fusing graph convolution networks and temporal convolution networks, which are divided into two main stages. In the first stage, the geo-weighting function is improved, which in turn leads to a simplification of the complex topology. In the second stage, the spatiotemporal feature analysis model (FAGTN) is designed based on GCN and TCN to improve the detection speed and accuracy. Summarized as follows.

The problem of STD accuracy detection cannot be ignored. When applying the method proposed in this article to a real scenario, some more specific problems need to be solved. For example, how to add the influence of the nodes’ attribute value, and how to quickly model the node. These problems need to be further resolved in the future.

This work was supported by the National Natural Science Foundation of China under Grants 42172161, by the Heilongjiang Provincial Natural Science Foundation of China under Grant LH2020F003, by the Heilongjiang Provincial Department of Education Project of China under Grants UNPYSCT-2020144, by the Innovation Guidance Fund of Heilongjiang Province of China under Grants 15071202202, and by the Science and Technology Bureau Project of Qinhuangdao Province of China under Grants 202101A226. The authors would like to express their gratitude to the anonymous reviewers, whose helpful comments and suggestions improved the quality of this article.

The authors declare that they have no conflicts of interest to report regarding the present study.