Nowadays, smart wearable devices are used widely in the Social Internet of Things (IoT), which record human physiological data in real time. To protect the data privacy of smart devices, researchers pay more attention to federated learning. Although the data leakage problem is somewhat solved, a new challenge has emerged. Asynchronous federated learning shortens the convergence time, while it has time delay and data heterogeneity problems. Both of the two problems harm the accuracy. To overcome these issues, we propose an asynchronous federated learning scheme based on double compensation to solve the problem of time delay and data heterogeneity problems. The scheme improves the Delay Compensated Asynchronous Stochastic Gradient Descent (DCASGD) algorithm based on the secondorder Taylor expansion as the delay compensation. It adds the FedProx operator to the objective function as the heterogeneity compensation. Besides, the proposed scheme motivates the federated learning process by adjusting the importance of the participants and the central server. We conduct multiple sets of experiments in both conventional and heterogeneous scenarios. The experimental results show that our scheme improves the accuracy by about 5% while keeping the complexity constant. We can find that our scheme converges more smoothly during training and adapts better in heterogeneous environments through numerical experiments. The proposed doublecompensationbased federated learning scheme is highly accurate, flexible in terms of participants and smooth the training process. Hence it is deemed suitable for data privacy protection of smart wearable devices.
The smart wearable device is an Internet of Things (IoT) equipment for personal health monitoring. It can obtain the human body status in realtime and communicate with other IoT devices. After acquiring the physiological data, the device transmits the data to a mobile application or an intermediate router and finally to a cloudbased database. There have been various applications of social IoT in recent years [
Studies are emerging on the use of Artificial Intelligence (AI) techniques and various types of physiological data to assess users’ health levels comprehensively [
Data security [
These problems have been improved somewhat since federated learning was proposed [
As federated learning is still at an early stage, many issues still need to be resolved. In recent years, synchronous federated learning models [
This paper aims to form a flexible federated learning scheme with better security and usability to protect sensitive data in smart wearable devices. We propose an asynchronous federated learning scheme based on double compensation. It is used for the horizontal federated learning scenario to solve the time delay problem and data heterogeneity. The scheme proposed in this paper generally has certain advantages over existing methods.
We summarize our main results and contributions as follows:
We propose an asynchronous federated learning scheme based on double compensation. The scheme uses the DCASGD method [
Our scheme performs a role reversal between the participants and the central server. The entire process is participantcentered making the update process more flexible and incentivizes participants. This design motivates the federated learning process.
We theoretically prove that the existing methods have previously mentioned problems, and our scheme can solve them and simplify the parameter update process.
Experiments show that our scheme can effectively improve the accuracy of asynchronous federated learning algorithms by about 5%, which is compromised for time delay and data heterogeneity problems.
Section 2 introduces the basic process of federated learning and the classic federated averaging (FedAvg) [
As data becomes more commoditized and personalized, privacy protection becomes an increasing concern for data owners. However, to maximize the value of their data, such as using machine learning to make more accurate predictions and classifications, these data owners need to work together to ensure that the data is as well distributed as possible. This collaborative approach inevitably leads to the problem of privacy leakage. Federated learning [
The subject of federated learning includes the participant and the central server [
The primary process of federated learning is divided into the following steps [
In addition, another essential factor in federated learning is data. Each participant has different amounts and types of data. The appropriate federated learning algorithm needs to be set up to enable better aggregating gradients, models, and other information according to these circumstances. Depending on the amount and type of data, the main types of federated learning are horizontal federated learning, vertical federated learning, and federated transfer learning.
Horizontal federated learning is also known as samplepartitioned federated learning [
The mathematical description of the horizontal federated learning scenario is as follows. The set
Horizontal federated learning implies that each participant has a complete data type. The implementation relies on the results of the training of each participant. Suppose the overall objective function satisfies a linear relationship in a weighted average with the objective function of each participant, such as the MSE, CrossEntropy, and other types of objective functions. Then the derivatives theoretically also satisfy such linear relationships. The theory is as follows.
Let the overall loss function be
From
Smart wearable system is an crucial application scenario of social IoT. Compared with other data security scenarios, smart wearable systems do not have powerful computing and storage capabilities on the device side. In contrast, their data are realtime, distributed, and multityped. When designing the privacy protection scheme for this scenario, the existing data security technologies need to be improved. The following section describes some federated learning methods suitable for the smart wearable scenario and summarizes the limitations.
The above solutions will have problems as follows. The current asynchronous federated learning assigns update weights by time lengthening [
The asynchronous federated learning algorithm is emerging to solve the problem of timeconsuming issues. It replaces the synchronous model with an asynchronous one, where the participants can join in updating the primary model and obtain the updated parameters at any time. For example, all
However, new issues arise. Updating for asynchronous federated learning has a different solid mathematical foundation than synchronous models. A common update method for multilayer neural networks is gradient descent, whose primary requirement is to update the original parameters. On the one hand, in asynchronous federated learning, as the master model is constantly being updated, the model parameters to be optimized by the participants are prone to mismatch with the existing model parameters of the central server. On the other hand, synchronous federated learning based on model updates also suffers from data heterogeneity due to the nonindependent and homogeneous distribution of the participant’s data. This problem leads to difficulties in converging the aggregated model in the central server after each participant has been trained several times locally. Besides, motivating participants is also a hot issue of federated learning in current research.
We build on previous studies and propose an asynchronous federated learning scheme based on double compensation, which maximizes solving these problems and makes it more practical. The flow and effect of each component in our scheme are shown in
The delay problem arises from asynchronous federated learning. The basic steps of asynchronous federated learning are as follows. First, the central server sends the initial model parameters to each participant. After local training, the participant individually sends the current model gradient to the central server. Finally, the central server returns the new model parameters directly to the participant after updating.
However, the asynchronous federated learning algorithm suffers from the time delay problem, depicted in
There is less existing research on asynchronous federated learning. Xie et al. proposed the FedAsync algorithm based on model averaging. It compensates for the loss during temporal asynchrony by weighting the parameters uploaded by the participants. The algorithm's update equation is:
As the algorithm is based on a failure function, the following problems will arise in its implementation. Firstly, for slower training participants, the algorithm suffers from underweighting. However, the faster training participants can not only participate in adjusting the model parameters several times but also have higher weights. So, the model formed by the final calculation will likely to be more biased towards these participants, resulting in poor model training. Secondly, the paper assumes intuitively that time delay correlates with error. When we revisit the asynchronous federated learning process, it can be found that the error occurs due to the update of the central server parameters. The actual correlation with the error should be the distance between
In the study of distributed deep learning, Zheng et al. propose the DCASGD method to compensate for the losses in asynchronous distributed learning [
Zheng et al. also demonstrate that
The difference between common distributed learning and federated learning lies in the different structures of the data sets. Distributed learning aims to improve the efficiency of machine learning, in which the data is planned to be divided so that each set has roughly the same structure. Federated learning is based on the user's privacy, and no changes can be made to the form of the data owned by the user. There is a NonIID phenomenon in the data, which creates a data heterogeneity problem.
The different data structures result in the distributed model optimizing in different directions during training, while the linear relationship with the overall model breaks down and does not even converge. As a result, the simple FedAvg algorithm cannot be used. The rationale is explained as follows. In a NonIID federated learning scenario with two participants, participant
And the optimal solution of the overall model is:
There is a linear relationship between
This paper uses the FedProx algorithm for nearestneighbor optimization to alleviate the data heterogeneity problem [
The scheme designs a motivating algorithm to enhance the positivity of the participants based on double compensations. The motivation for federated learning is a hot issue in current research [
However, the participants cannot see the training results at any time and can only follow the instructions of the agreement to complete the training within a fixed number of rounds. In the long term, their motivation can be severely dampened, leading to reduced participation and ineffective continuation of federated learning outcomes. The introduction of asynchronous federated learning has reduced the time cost of the federated learning system and made the updateissuance process more flexible. But it still requires participants to deploy a rigorous environment and conduct a fixed number of training rounds. It is a seminormative learning process overall.
The scheme proposed in this paper performs a role reversal between the two parties. The entire process is participantcentered, and participants will focus on completing their training. The central server or the coordinator is involved in a supporting role throughout the process.
Inspired by
It results in a new federated learning scheme: the participants train to approximate convergence and the central server performs gradient descent. The new model has three motivational effects. First, the participants are motivated by being able to train themselves in appropriate values and see the results of their models at any time. Second, the participants’ autonomy increases by lifting the restrictions on training time and several training rounds; Third, participants’ initiative to join federated learning enhances. Because the central server acts as a supporting role to help the participants escape the local minimum trap and overfitting.
From the illustrations in 3.1 and 3.2 with
A gradient descent at the central server gives that:
A system overview is illustrated in
The detailed algorithm is shown in Algorithm 1 (
Algorithm 1: Federated learning based on doublecompensation  

1:  
2:  
3:  Initialize 
4:  Send the model 
5:  
6:  Receive the pair 
7:  
8:  
9:  Send the model 
10:  
11:  
12:  Receive model 
13:  
14:  Define loss function 
15:  
16:  Random Sample 
17:  Update 
18:  
19:  
20:  Push 
21: 
Notation  Description 

The moment when the central server issues the model to the participant  
The moment when the central server receives the results of the participant, 

The moment the central server model is updating  
k  The participant’s index 
The overall model parameters at the moment 

The training results for participant 

Regularization weight  
Parameters to improve the accuracy of time delay compensation, 

Loss Threshold  
The gradient of the participant 

All data of the participants 

Loss function with parameters for model 

The gradient descent step of the central server 
The data used in this paper were collected in the field and posttabulated from smart wearable devices. Data from smart wearable devices are often private. Yet their data can be used to provide better services to users. The application of federated learning in smart wearable devices is essential. The data contained eight sets of screened characteristics related to cardiorespiratory fitness: rapid heart rate, blood oxygen variance, running time, heart rate rise speed, heart rate reserve, heart rate fall speed, resting heart rate, and maximum heart rate. The labels are derived from ratings of 800/1000 meters running performance on a physical test closely related to adolescent fitness, with four levels
In the simulation experiments, the training set was divided into
The specific setup and description of the simulation experiment are as follows.
Our method and the FedAsync method do not have similarities with the existing FedAsync method, and the parameters play different roles. This paper selects multiple sets of parameters for comparison experiments to obtain an overall comparison.
In the simulation experiments, we randomly generate a sequential list of participants involved in the overall model update to implement the asynchronous scenario. It can simulate asynchronous learning in a variety of training situations. The list length is set to 30. When the update proceeds to the participant
To compare the convergence speed of the two methods more intuitively, we set each participant to enough rounds to allow comparison at the same training level. It satisfies the approximate convergence condition of the proposed method. In the experiments, the number of local training rounds for each participant is set to 1000.
For the comparison experiments of the FedAsync algorithm, we choose the more effective hinge update methods, whose failure functions are described in
In this experiment, the Fedprox operator was added to the loss functions of both algorithms. The Fedprox operator was set as the parameter
In the model training, we find that the optimal value of the step size
For different numbers of participants and training parameters, the loss descent images of this paper’s method and the FedAsync algorithm (FedAsynchinge algorithm) are shown in
Number of participants  Algorithms  Parameter values  Accuracy after 30 rounds 

M = 2  Our method  0.9019  
FedAsynchinge  0.8485  
0.8473  
0.8552  
M = 4  Our Method  0.9039  
FedAsynchinge  0.8486  
0.8503  
0.8512  
M = 6  Our method  0.9060  
FedAsynchinge  0.8543  
0.8521  
0.8451 
In
To make the data distribution of each participant as dispersed as possible to form a robust data heterogeneous environment, we group and arrange the datasets according to labels and divide them into participants. In a data heterogeneous environment, the loss reduction images and accuracy of our scheme and FedAsynchinge algorithm are shown in
Some measures of classification performance, including Precision, Recall, and
Number of participants  Algorithms  Parameter values  Accuracy after 30 rounds 

M = 2  Our method  0.9031  
FedAsynchinge  0.8513  
0.8498  
0.8516  
M = 4  Our method  0.9082  
FedAsynchinge  0.8461  
0.8540  
0.8589  
M = 6  Our method  0.9088  
FedAsynchinge  0.8531  
0.8540  
0.8477 
Our method  FedAsync 
FedAsync 
FedAsync 


Excellent  0.8371  0.5212  0.6342  0.6048 
Good  0.9538  0.9193  0.9236  0.9512 
Pass  0.9061  0.7923  0.7681  0.8449 
Poor  0.7736  0.5978  0.6700  0.7547 
As seen above, the scheme proposed in this paper changes the updated method of federated learning. Both the FedAsync method and our method are related to scalar multiplication and addition operations of tensors. Although our method also involves two dot multiplication operations, the time complexity of both update methods is
In the scenario of this paper, the training loss with time is as
In discussing space complexity, all three methods (Our method, the FedAsync method, and the asynchronous federated learning without two compensations) occupy the space of the order of magnitude of the model parameters with a complexity of O(n).
This subsection explains the performance of the proposed work.
The research highlights of our method are listed as follows.
It solves two main problems: the time delay problem and the data heterogeneity problem. Both of these problems can lead to low accuracy.
The accuracy is about 5% higher on the training set than the FedAsync algorithm.
The convergence of our method is smoother during the training process, and it can adapt better in heterogeneous environments.
In terms of time complexity, the scheme in this paper and the other asynchronous federated learning algorithms only modify the update method of the central server. Therefore, consume the same order of magnitude of time for each round of training. Our method has a mathematical basis, reduces the number of parameters that need to be adjusted, and has good practicality. In contrast to methods such as FedAsync, which averages categories of models, the scheme in this paper uses a gradient descent method with better accuracy and mathematical basis, avoiding the influence of local models of the participants in the process of model averaging.
In this paper, we proposed an asynchronous federated learning scheme based on double compensation to solve time delay and data heterogeneity problems. Our experiments showed that the proposed scheme converged more smoothly and was more accurate than the existing algorithm. It performs better in scenarios where data from smart wearable devices are analyzed and processed at scale. For federated learning, the current algorithms only consider the case of honest participants and central servers. In the future, we will continue to study the problem of security mechanisms for dishonest participants or central servers in the flexible federated learning environment proposed in this paper.
This research is supported by the National Natural Science Foundation of China, No. 61977006.
The authors declare they have no conflicts of interest to report regarding the present study.