Federated learning is a distributed machine learning method that can solve the increasingly serious problem of data islands and user data privacy, as it allows training data to be kept locally and not shared with other users. It trains a global model by aggregating locally-computed models of clients rather than their raw data. However, the divergence of local models caused by data heterogeneity of different clients may lead to slow convergence of the global model. For this problem, we focus on the client selection with federated learning, which can affect the convergence performance of the global model with the selected local models. We propose FedChoice, a client selection method based on loss function optimization, to select appropriate local models to improve the convergence of the global model. It firstly sets selected probability for clients with the value of loss function, and the client with high loss will be set higher selected probability, which can make them more likely to participate in training. Then, it introduces a local control vector and a global control vector to predict the local gradient direction and global gradient direction, respectively, and calculates the gradient correction vector to correct the gradient direction to reduce the cumulative deviation of the local gradient caused by the Non-IID data. We make experiments to verify the validity of FedChoice on CIFAR-10, CINIC-10, MNIST, EMNITS, and FEMNIST datasets, and the results show that the convergence of FedChoice is significantly improved, compared with FedAvg, FedProx, and FedNova.

Mobile devices have become the primary electronic devices for billions of users around the world, and we will witness the explosion of IoT devices in the coming years [

Federated learning consists of three key phrases: (1) Client selection, the server selects random clients and disseminates the global model parameters to the selected clients; (2) Client local training, clients use their local data to update the shared model and upload the new model parameters to the server; (3) Server aggregation, the server aggregates the global model by averaging the updated local parameters. However, due to different client habits, for example, some clients may have more pictures of cats, while others may have more pictures of dogs. In this case, the datasets on each node are not identically and independently distributed, i.e., Non-IID. As a result, the local models trained on these data will also have biases. This means that the local models of clients selected to learn a global model jointly can significantly impact the quality of the global model. Since data distribution divergence between clients may cause parameter divergence between local models, how to select clients becomes essential for training the global model efficiently.

The uniform random selection algorithm is the most frequently used client selection strategy, which makes each client get the same selection probability and keep the same loss expectation. However, in the Non-IID setting, this method cannot accurately select the clients that would be most beneficial for accelerating the training process. As a result, the convergence performance with Non-IID data cannot be improved. The global optimization goal of federated learning is to minimize the loss value of all clients. The loss value generated during the client training can reflect the model’s prediction ability for the data on the client, and the higher the loss value, the poorer the model’s performance. If these losses can be appropriately used, the convergence of the global model can be accelerated.

Driven by the above descriptions, we propose a federated learning client selection method based on the loss function, FedChoice, to improve the convergence of the global model by selecting appropriate local models. The key contributions are as follows:

It analyzes the principles of selected local models that can affect the convergence of the global model and find that the loss function of the local model is a main influence factor for the convergence of the global model under the Non-IID data.

It proposes a client selection strategy based on the loss function, which gives a higher selected probability for clients with higher loss values, to speed up the convergence of the global model.

It introduces a gradient correction method to correct the local gradient vector direction, which can improve the model’s accuracy while keeping the convergence performance of our selection method. It employs a local offset control item to predict the local gradient direction and the global gradient direction and calculates a vector to correct the local gradient vector direction.

It verifies the effectiveness of our method by comparing the precision and convergence performance against three baseline algorithms FedAvg, FedProx, and FedNova on EMNIST, MNIST, FEMNIST, CINIC-10 and CIFAR-10 datasets. The results show that FedChoice can improve the convergence performance of federated learning in contrast with FedAvg by up to 18.7%.

The rest of this paper is organized as follows. Firstly, we review some related work in

Non-IID data is a challenge for federated learning. There exist extensive works on improving the performance of federated learning via data sharing [

Statistical heterogeneity (also known as the Non-IID problem) is a bottleneck of federated learning [

Many techniques have been proposed to tackle the accuracy degradation of federated learning in the Non-IID setting. FedProx [

In order to improve the convergence performance of the global model, FedNova [

Client selection strategy is a flexible way to improve the performance of federated learning.

There have been some researches on the application of machine learning to client selection strategy. Huang et al. [

Some methods use objective indicators as criteria for client selection. Ribero et al. [

We formally define the problem of federated learning and analyze the effects caused by Non-IID data on federated learning in this section. We take a classification task in sample space

And the learning task can be expressed as

To determine

In federated learning, assuming there are

Assuming that the synchronization is conducted every

In the next synchronization, the server will send

Limited by the communication conditions, not all clients can be selected to train the global model in each round. A common way is to select a subset of clients, so, the selection strategy

The most common method of client selection in federated learning is uniform selection, that is, all clients have the same probability of being selected

To overcome this problem, Li et al. [

Specifically, for the global model, the objective of the training is to minimize the loss function of all clients (as defined in

Therefore, this paper will select clients based on the optimization of the loss function to improve the quality of the global model.

In this section, we introduce FedChoice, which optimizes the client selection method by giving a higher selected probability to the clients with high loss function. The flowchart of FedChoice is shown in

When the training data is highly Non-IID, the client extraction becomes unstable due to its skewed data. The global model cannot quickly get unknown knowledge from the local training, leading to slow convergence. Aiming at accelerating the convergence of the global model, this paper applies the client selection strategy

Assuming there are four clients and one server, and there are circular, square and triangular samples, the data distribution with Non-IID is shown in

The

We define the importance weight of the client in Definition 1 to determine the selection probability for clients.

In federated learning, it is difficult for the server to get the loss values of all clients limited by communication and hardware resources. Therefore, for the clients that did not participate in this round of training, we set the client value of the next round to be the same value as the previous round, that is,

The high selection tendency of clients with high loss value can improve the convergence speed, but it leads to the deviation between the optimal value of the global loss function and the ideal optimal value. That is to say, excessive pursuit of the convergence rate brought by clients with high loss value may result in the decreasing accuracy of the global model. It is not advisable to select all clients by using simple strategy

Firstly, we construct a selection probability function with the importance weight of the client. Let

And then the remaining

The details of the FedChoice algorithm are described in Algorithm 1.

As stated in

In the training process of federated learning, the local model is obtained by optimizing the loss function of local data, while the global model is obtained by aggregating the local models participating in the training. According to the traditional SGD method, the client uploads its update parameters to the server in each step. The local deviation from each step is corrected by the global aggregation process in time. Thus, the global optimization direction is closer to the ideal updated direction and the global optimal can be achieved with fewer steps.

However, communication costs are relatively high in federated learning, so it is hard to aggregate all local models. As shown in

In federated learning, the accumulation of local update deviation is inevitable owing to the communication problem. If the local updated direction can be directed to the global updated direction to reduce the local deviation caused by Non-IID data, the performance of the global model will be greatly improved. Therefore, as shown in

Based on the above definition and analysis, FedChoice algorithm is proposed to optimize the convergence rate and accuracy of the model in case of Non-IID data in federated learning. Combined with the algorithm architecture shown in

In order to evaluate the effectiveness of our method, we make experiments on five datasets: MNIST [

All the algorithms involved in the experiments are implemented in the Python 3.7.2 with the Pytorch framework. We use Tesla P100 GPU on Ubuntu 18.04.4 LTS system for our experiments. The specific configurations of Linux kernel, graphics card driver and CUDA are shown in

Environment | Version |
---|---|

Operating system | Ubuntu 18.04.4 LTS |

GPU | NV Tesla P100 |

Linux kernel | 4.15.0-123-generic |

Graphics card driver | 418.87.00 |

CUDA | 10.1.243 |

We employ MNIST, FMNIST, EMNIST, CINIC-10 and CIFAR-10 to assess the performance of FedChoice. MNIST [

For each dataset, we apply Non-IID settings, that the datasets are sorted by class and divided into 20 partitions and each client is randomly assigned 2 partitions from 2 classes. That is, each client has two classes of data labels. In particular, FEMNIST is a non-independent and identically distributed dataset, which does not need to be re-divided.

In this section, we will search for the appropriate hyperparameter

FEMNIST | CIFAR-10 | MNIST | EMNIST | CINIC-10 | |
---|---|---|---|---|---|

0.2 | 80.8 | 84.2 | 93.0 | 88.2 | 80.4 |

0.8 | 80.9 | 85.2 | 94.4 | 87.4 | 79.5 |

To investigate the accuracy of FedChoice, we evaluate the accuracy of the global model by comparing with three baseline algorithms on five datasets, and the results are shown in

Algorithm | FEMNIST | CIFAR-10 | MNIST | EMNIST | CINIC-10 |
---|---|---|---|---|---|

FedAvg | 78.9 | 83.1 | 93.2 | 86.4 | 75.7 |

FedNova | 80.5 | 85.7 | 94.2 | 88.0 | 78.6 |

FedProx | 80.0 | 86.1 | 95.7 | 88.2 | 79.3 |

To analyze the convergence of the global model, we set 80% as the convergence accuracy of FEMNIST, EMNIST and CIFAR-10 datasets, 90% as the convergence accuracy of MNIST, and 75% as the convergence accuracy of CINIC-10.

Algorithm | Accuracy:50% | Accuracy:60% | Accuracy:70% | Accuracy:80% | ||||
---|---|---|---|---|---|---|---|---|

Epoch | Time | Epoch | Time | Epoch | Time | Epoch | Time | |

FedChoice | ||||||||

FedAvg | 62 | 1088 | 76 | 1149 | 100 | 1503 | 191 | 2857 |

FedNova | 59 | 962 | 71 | 1102 | 95 | 1470 | 182 | 2832 |

FedProx | 60 | 972 | 73 | 1150 | 98 | 1520 | 185 | 2873 |

Algorithm | Accuracy:50% | Accuracy:60% | Accuracy:70% | Accuracy:80% | ||||
---|---|---|---|---|---|---|---|---|

Epoch | Time | Epoch | Time | Epoch | Time | Epoch | Time | |

FedChoice | ||||||||

FedAvg | 46 | 1388 | 60 | 1803 | 70 | 2648 | 131 | 4038 |

FedNova | 35 | 1078 | 50 | 1534 | 74 | 2398 | 124 | 3802 |

FedProx | 33 | 1062 | 47 | 1506 | 52 | 2044 | 112 | 3690 |

Algorithm | Accuracy:60% | Accuracy:70% | Accuracy:80% | Accuracy:90% | ||||
---|---|---|---|---|---|---|---|---|

Epoch | Time | Epoch | Time | Epoch | Time | Epoch | Time | |

FedChoice | ||||||||

FedAvg | 46 | 506 | 53 | 583 | 88 | 968 | 102 | 1122 |

FedNova | 48 | 576 | 56 | 672 | 69 | 828 | 110 | 1320 |

FedProx | 27 | 302 | 42 | 527 | 65 | 762 | 80 | 922 |

Algorithm | Accuracy:50% | Accuracy:60% | Accuracy:70% | Accuracy:80% | ||||
---|---|---|---|---|---|---|---|---|

Epoch | Time | Epoch | Time | Epoch | Time | Epoch | Time | |

FedChoice | ||||||||

FedAvg | 37 | 629 | 50 | 850 | 75 | 1275 | 127 | 2159 |

FedNova | 33 | 578 | 48 | 841 | 72 | 1264 | 118 | 2065 |

FedProx | 32 | 550 | 48 | 826 | 69 | 1187 | 115 | 1978 |

Algorithm | Accuracy:50% | Accuracy:60% | Accuracy:70% | Accuracy:75% | ||||
---|---|---|---|---|---|---|---|---|

Epoch | Time | Epoch | Time | Epoch | Time | Epoch | Time | |

FedChoice | ||||||||

FedAvg | 48 | 1488 | 59 | 1829 | 91 | 2821 | 128 | 3842 |

FedNova | 49 | 1617 | 61 | 2013 | 83 | 2739 | 124 | 4092 |

FedProx | 45 | 1440 | 54 | 1728 | 71 | 2272 | 98 | 3136 |

In this paper, we propose FedChoice, a client selection method based on loss function optimization for federated learning to solve the inefficient convergence problem under Non-IID data. FedChoice gives a higher selection probability to the client with a high loss value, which allows it more likely to participate in training, thus speeding up the convergence of the global model. Besides, we introduce the local offset control item to predict the local gradient direction and the global gradient direction, and calculate a vector to correct the local gradient vector direction, to reduce the accumulated deviation of local gradient caused by Non-IID data. The experiments have validated the effectiveness of FedChoice and demonstrated that FedChoice has significant improvement in convergence and accuracy, compared with FedAvg, FedProx and FedNova. In the future, we will introduce a more effective probability extraction formula from the perspective of gradient update.

This work is supported by the

The authors declare that they have no conflicts of interest to report regarding the present study.