In the power distribution system, the missing or incorrect file of userstransformer relationship (UTR) in lowvoltage station area (LVSA) will affect the lean management of the LVSA, and the operation and maintenance of the distribution network. To effectively improve the lean management of LVSA, the paper proposes an identification method for the UTR based on Local Selective Combination in Parallel Outlier Ensembles algorithm (LSCP). Firstly, the voltage data is reconstructed based on the information entropy to highlight the differences in between. Then, the LSCP algorithm combines four base outlier detection algorithms, namely Isolation Forest (IForest), OneClass Support Vector Machine (OCSVM), CopulaBased Outlier Detection (COPOD) and Local Outlier Factor (LOF), to construct the identification model of UTR. This model can accurately detect users’ differences in voltage data, and identify users with wrong UTR. Meanwhile, the key input parameter of the LSCP algorithm is determined automatically through the line loss rate, and the influence of artificial settings on recognition accuracy can be reduced. Finally, this method is verified in the actual LVSA where the recall and precision rates are 100% compared with other methods. Furthermore, the applicability to the LVSAs with difficult data acquisition and the voltage data error in transmission are analyzed. The proposed method adopts the ensemble learning framework and does not need to set the detection threshold manually. And it is applicable to the LVSAs with difficult data acquisition and high voltage similarity, which improves the stability and accuracy of UTR identification in LVSA.
The userstransformer relationship (UTR) refers to the subordinate relationship between the end user’s electricity meter and the transformer in the lowvoltage station area (LVSA). The UTR of LVSA in China is shown in
To solve the problems, such as abnormal line loss caused by the wrong UTR, it is necessary to identify the UTR in LVSA. Traditional engineering methods, including instantaneous power outage method [
In recent years, with the popularity of smart meters, the acquisition system has accumulated massive electrical data of users, providing a basis for the UTR identification in the LVSA [
Similarity of electrical data: the similarity of all customer voltage data was measured based on the Pearson correlation coefficient [
Conservation of electrical data: In [
Multisource data: In [
Other methods: In [
UTR identification  Key words  References 

Similarity of electrical data  Pearson correlation coefficient  [ 
Discrete Frechet distance  [ 

Clustering  [ 

Outlier detection  [ 

Conservation of electrical data  Power conservation  [ 
Current conservation  [ 

Multisource data  Voltage data and current data  [ 
Voltage data and power data  [ 

Other methods  Prior knowledge  [ 
Knowledge graph  [ 
Based on the voltage distribution characteristics in LVDN, the similarity of voltage data changes is low due to the fact that users in different LVSA belong to different outgoing lines and have a long electrical distance [
Different electrical characteristics of different LVSAs, high similarity between individual LVSAs voltage data, and limited data, lead to low accuracy, weak applicability and low reliability of UTR identification in the existing methods. To solve these problems, this paper proposes a UTR identification method based on Local Selective Combination in Parallel Outlier Ensembles (LSCP) algorithm. The main contributions of the paper are listed as follows:
Information entropy is used to reconstruct the original voltage data to highlight the difference between data and reduce the impact of data noise.
Based on the voltage characteristics of the LVDN, an identification method of UTR based on the ensemble framework LSCP algorithm is proposed, which increases the accuracy and stability of identification results of UTR in the LVSA.
Based on the historical line loss rate data in LVSA, the key input parameter of the proposed model is determined automatically, which increases the accuracy of the identification result, avoids the artificial parameter setting, and is more suitable for the implementation of the UTR identification in the massive LVSAs.
The rest of this paper is organized as follows.
The core function of the outlier detection method is to identify the data with different regularity from most data in the input dataset. Compared with the outlier detection algorithm of the general ensemble framework, the LSCP algorithm is a completely unsupervised outlier detection algorithm integrated with multiple outlier detection algorithms in parallel [
To strengthen the stability and robustness of the identification result of the UTR in LVSA, we adopt the LSCP algorithm to identify the UTR.
The specific flow of outlier data detection by the LSCP algorithm is as follows:
Train base outlier detector
Enter the voltage dataset
Define data local space
For each piece of voltage data
T group random sampling is carried out to obtain T group feature subspace. In the adjacent samples of the feature subspace of data
The cardinality b of the local space of each data
Calculate the local anomaly score of data
After the local region of voltage data
Pseudo target
The maximum score of R base anomaly detector is taken as the pseudo target of the data sample in
Choose base detection model
The Pearson Correlation Coefficient is calculated between the local anomaly score matrix
Anomaly score of data
According to the selected
Judge outlier data
After obtaining the anomaly scores matrix
Parameter symbol  Parameter description 

Proportion of outlier data  
Number of KNN nearest neighbor samples  
Random sampling times  
A random sampling of the lowest dimension  
A random sampling of the highest dimension  
Threshold of sample number in feature subspace 
The proportion of outlier data
In this paper, the ensemble learning LSCP algorithm is combined with heterogeneous base outlier detection model, and four classical outlier detection algorithms are used as the base model. The choice of the base detection is Isolation Forest (IForest), OneClass Support Vector Machine (OCSVM), CopulaBased Outlier Detection (COPOD), and Local Outlier Factor (LOF). The four base outlier data detection algorithms are used to detect different angles of data anomalies, promote the learning of different characteristics of the data, and improve the reliability and stability of the detection model.
The Isolation Forest algorithm realizes outlier detection of a dataset based on the idea of partition. The less the sample is divided, the more easily it is isolated, and the higher the anomaly degree of the sample is. The anomaly score of each data is shown in
OCSVM is an outlier detection algorithm proposed for unbalanced samples. By mapping original data to highdimensional space through kernel function, there are significant differences between normal data and abnormal data. A hyperplane is constructed to separate the two accordingly. The decision function to judge whether the data is abnormal is shown in
COPOD realizes anomaly detection based on statistical methods. Aiming at the diversity of data distribution and the multidimensional data, it calculates the tail probability of each data point and calculates the skewness of distribution to correct the tail probability of data, so as to estimate the anomaly degree of data. The anomaly score of data x is shown in
LOF is based on the idea of density. It detects anomalies by comparing the density of each point with its neighboring points. Moreover, the LOF algorithm calculates the density through the
The voltage data of users in different LVSAs are different, showing different trends and characteristics, and the proportion of users with the wrong UTR is very small. Therefore, the problem of UTR identification can be defined as outlier data detection of unbalanced samples. To ensure the accuracy and reliability of the results, this paper adopts the LSCP algorithm with an ensemble framework to identify the relationship between users and the transformer in LVSA.
At present, the data in the user information collection system of the LVSA includes voltage, current, electricity consumption, and power. In this paper, voltage data is selected as the input of the LSCP algorithm, because the variation patterns of voltage data are not similar for users in different LVSAs due to electrical distances and different outgoing lines [
As can be seen from
Meanwhile, to highlight the difference in users’ voltage data, this paper reconstructed voltage data based on the information entropy. Information entropy refers to the degree of ordering or complexity of voltage data at different times. When the information entropy of voltage data at a certain time is large, it indicates that the voltage data of the user at that time is greatly different from that of other users. The user’s voltage data
Voltage information entropy at different moments can be defined as follows [
Based on the information entropy of voltage data, the voltage data reconstruction coefficient is calculated, as shown in
Based on the LSCP algorithm, the principle of the identification method of the UTR is as follows: input the voltage data reconstructed by information entropy of all users in the LVSA, use the LSCP algorithm to accurately detect the abnormal trend of the voltage data, judge the abnormal degree of the voltage data, and get the abnormal score of the voltage data of each user. The individual user with the highest data anomaly score is regarded as the user with wrong UTR, and the output is 1. Other users are regarded as normal users, and the output is 0. Among the input parameters of the LSCP algorithm, the proportion of outlier data is a key input parameter, which controls the number of users with the wrong UTR. In the UTR identification, the proportion of users with the wrong UTR is unknown. If the proportion of outlier data is set to a fixed value, the accuracy and efficiency of calculation results will be affected. This paper tries to optimize the key input parameters selection of the LSCP algorithm from the perspective of line loss rate, to avoid artificially setting parameters affecting the identification results and improve the practicability of the proposed method. The procedure for determining the key input parameter is as follows:
Input different proportions of outlier data values. The LSCP algorithm is used to get the serial number of users with wrong UTR. In this paper, the proportion of outlier data is in the interval [0, 0.10], and the value is traversed every 0.01. This is because the number of users in LVSA in this paper is about 100. When the interval is 0.01, the number of users with wrong UTR output each time increases by about 1. If 0.005 is selected, the amount of calculation will increase; if 0.02 is used, the parameter changes are too large, and it is difficult to find the optimal parameter.
The sum of electricity consumption of users with the right UTR is selected to calculate the line loss rate of the LVSA.
Considering the LVSA existing line loss, the minimum line loss rate threshold should be set according to the historical line loss record of the system or the actual LVSA characteristics.
If the line loss rate calculated according to different input parameters is lower than the minimum line loss threshold, it is considered that an error occurs in the UTR. When the outlier data proportion of input data is increased, the number of users with wrong UTR in output will increase. The users with right UTR may be misjudged as abnormal users, which will further reduce the total power consumption of users in the LVSA and increase the line loss rate. Therefore, on the basis of the minimum line loss threshold, the input value corresponding to the minimum line loss rate is selected as the optimal parameter with flowchart shown in
The specific implementation steps of the UTR identification method based on the proposed method are as follows:
Data processing
Filter out and collect complete data from the user information collection system, including 2 days’ voltage data and 10 days’ daily electricity consumption data. Meanwhile, voltage data is reconstructed based on information entropy.
Determine the outlier data proportion
The input parameter outlier data contamination
Userstransformer relationship identification
Input the best parameter, use the LSCP algorithm to identify the UTR, and obtain users with wrong UTR in the LVSA.
Onsite verification
The staff verifies the users with abnormal UTR onsite and updates the system’s UTR files on time.
The electricity consumption and voltage data of Nanjing, Jiangsu Province in April 2020 is selected as the dataset. The voltage data is used to identify the relationship between transformer and user, and the electricity consumption data is used to determine the optimal input parameter of the algorithm. Experiments are carried out in the actual LVSAs to verify the effectiveness and applicability of the proposed method. The structure diagram of the LVSAs is shown in
Three simulation scenarios are set to verify the proposed UTR identification method. Simulation scenario 1: There are 107 users in LVSA 1. 5 users are randomly selected from LVSA 2 and placed in LVSA 1 as users with wrong UTR. Simulation scenario 2: There are 89 users in LVSA 2, and 3 users are randomly selected from LVSA 1 and LVSA 3 respectively as users with wrong UTR. Simulation scenario 3: There are 7 LVSAs in total. 5 users from one LVSA are randomly selected and assigned to the other 6 LVSAs as users with wrong UTR. In this paper, recall and precision are used to evaluate the effect of the method. The calculation of the two indexes is shown in
The State Grid Corporation of China defines the abnormal line loss LVSA as that lasts over 10 days. In Simulation scenario 1, the average power consumption of users in the LVSA in 10 days is used to calculate the line loss rate and determine the proportion of outlier data. Input parameters are taken at 0.01 intervals in [0, 0.10], and traversal calculation is carried out to obtain the UTR membership and LVSA line loss rate of each calculation. The LVSA line loss rate is shown in
Proportion of outlier data/ 
ID of abnormal user  Line loss rate/%  Optimum parameter 

0.02  108,109,112  1.05%  no 
0.03  108,109,110,112  1.39%  no 
0.04  108,109,110,111,112  2.25%  yes 
0.05  56,108,109,110,111,112  2.45%  no 
0.06  56,98,108,109,110,111,112  5.02%  no 
Due to limited space, only the calculation results of parameters in the interval [0.02,0.06] are listed in
Simulation scenarios 1 and 2 are used to verify the applicability and effectiveness of the identification method of the userstransformer relationship in the station area based on the LSCP algorithm. According to the calculation rules in
Parameter symbol  Parameter value 

30  
20  
24  
48  
15 
In practice, due to the close electrical distance between LVSAs, the voltage similarity of users in different LVSAs is very high. In simulated scenarios 1 and 2, the Pearson correlation coefficient method is used to measure the voltage data similarity index between the users with normal UTR and users with abnormal UTR in the LVSA. The mean value of the correlation coefficient is shown in
As is seen from
Simulation scenario  ID of abnormal user  Recall ratio/%  Precision/% 

Simulation scenario 1  108, 109, 110, 111, 112  100%  100% 
Simulation scenario 2  90, 91, 92, 93, 94, 95  100%  100% 
In
To further illustrate the superiority of the method proposed in this paper in solving the UTR identification problem in LVSAs, it is compared with the single outlier detection algorithm. The comparison results are shown in
Method  Recall ratio/%  Precision%  ID of abnormal users 

Isolation Forest  100%  93.75%  8,34,56,59,96,98,100, 108,109,110,111,112 
OneClass Support Vector Machine  100%  93.75%  5,9,34,56,59,98,100, 108,109,110,111,112 
CopulaBased Outlier Detection  100%  93.75%  3,8,10,34,56,96,100, 108,109,110,111,112 
Local Outlier Factor  100%  93.75%  30,33,38,40,88,98,103, 108,109,110,111,112 
The proposed method  100%  100%  108,109,110,111,112 
According to the test results in
At the same time, to verify the universality and reliability of the proposed method, under simulated scenario 3, the voltage data of two days is used to identify the UTR of six LVSAs. The test results are shown in
LVSA  Number of actual users  Recall ratio/%  Precision/% 

LVSA 1  143  100%  100% 
LVSA 2  116  100%  100% 
LVSA 3  124  100%  100% 
LVSA 4  156  80%  98.71% 
LVSA 5  87  100%  100% 
LVSA 6  134  100%  100% 
In
The influence of the voltage data of different days and the error of voltage data in transmission on the identification result are considered to verify the method in the practical application.
In simulated scenario 1, voltage data of different days are taken as input data to identify the relationship between users and the transformer and to verify the influence of voltage data of different days on the identification result, as shown in
Number of days to enter the voltage data/day  Recall ratio/%  Precision/% 

1  60%  98% 
2  100%  100% 
3  100%  100% 
4  100%  100% 
5  100%  100% 
It can be seen from
Considering the existence of random errors in the measurement and transmission of electricity data, the influence of the errors on the identification results of the proposed method is verified. In simulation scenario 1, the measured voltage data for 30 days in April in the LVSA (the actual measured voltage data contains random measurement errors) is selected, and the twoday data was divided into 1 group. The average test results of 15 groups of data are shown in
Simulation scenario  Number of total users  Number of abnormal users  Average recall ratio/%  Average precision/% 

Simulation scenario 1  112  5  98.66%  99.88% 
When testing 15 groups of data, only the third group has a recall rate of 80% and a precision rate of 98.21%, while the other groups have a recall rate and precision rate of 100%. The average recall rate is 98.66% and the average precision rate is 99.88%. Therefore, the random error in voltage measurement has no influence on the accuracy of the algorithm identification results, which further demonstrates the practicability and antiinterference of the proposed method.
This paper proposes an ensemble learning LSCP algorithm to identify the relationship between users and the transformer, which provides a new idea for solving the UTR correction. The proposed method contributes to the lean management of the LVSA, which is of great significance to the economic operation of the LVDN. The effectiveness of the proposed method is verified in the three designed simulation scenarios. It is concluded that the LSCP algorithm of the ensemble framework is used to build an identification model for the UTR, which improves the accuracy and reliability of identification results. The recall and precision rate of this method can reach 100%. The proposed method can make use of twoday voltage data to identify the UTR accurately and avoid the dependence on data, which is applicable to the LVSA where data acquisition is difficult while reducing the cost of calculation. In the case of high voltage similarity of users in different LVSAs, the method presented in this paper can still achieve accurate identification and meet the requirements of practical application. In this paper, the optimal key input parameter of the algorithm is determined automatically by the line loss rate index, which can adapt to the characteristics of different LVSAs actively and improve the practicability. The shortcoming of the proposed method is that it can merely find there are users with wrong UTR in LVSA, without automatically determining which LVSA the user belongs to. In the future, we can study how to achieve UTR identification based on voltage characteristics when photovoltaic and other renewable power generation are connected to the LVSA.
Userstransformer relationship
Lowvoltage station area
Local Selective Combination in Parallel Outlier Ensembles algorithm
Isolation Forest
OneClass Support Vector Machine
CopulaBased Outlier Detection
Local Outlier Factor
Dynamic time warping distance
Mixed integer linear programming
Lowvoltage distributed network
KNearest Neighbor
The authors would like to acknowledge the State Grid Jiangsu Electric Power Co., Ltd. and Jiangsu Frontier Electric Power Technology Co., Ltd. Special thanks to Professor Sun Guoqiang for his careful guidance on this paper.