^{5}

To solve the problem of energy consumption optimization of edge servers in the process of edge task unloading, we propose a task unloading algorithm based on reinforcement learning in this paper. The algorithm observes and analyzes the current environment state, selects the deployment location of edge tasks according to current states, and realizes the edge task unloading oriented to energy consumption optimization. To achieve the above goals, we first construct a network energy consumption model including servers’ energy consumption and link transmission energy consumption, which improves the accuracy of network energy consumption evaluation. Because of the complexity and variability of the edge environment, this paper designs a task unloading algorithm based on Proximal Policy Optimization (PPO), besides we use Dijkstra to determine the connection path between edge servers where adjacent tasks are deployed. Finally, lots of simulation experiments verify the effectiveness of the proposed method in the process of task unloading. Compared with contrast algorithms, the average energy saving of the proposed algorithm can reach 22.69%.

In recent years, with the rapid development of Internet of things (IoT) technology, tens of billions of terminal devices (TD) in IoT network have realized economic and efficient interconnection. According to the prediction of Cisco visual network index, by 2023, IoT devices will account for 50% of all networked devices, and the number of connections between devices will reach 14.7 billion [

Cloud computing offloads the task to the remote cloud servers, which improves the service response speed by using the computing resources in the cloud [

The problem of energy consumption optimization in the process of task unloading has attracted extensive attention. However, most of existing works focus on the energy consumption management and resource allocation of terminal equipment, lacks energy management method of edge servers,

We focus on the energy consumption optimization of edge servers during task unloading.

In this paper, we first construct a mathematical model including server energy consumption and link energy consumption. Because of the complexity of the edge computing environment, it is necessary to use a reliable and scalable learning algorithm. Based on the above analysis, we design an edge task unloading algorithm based on Proximate Policy Optimization (PPO). The technical contributions of this paper are summarized as follows:

To accurately describe the energy consumption of edge network during task unloading, we construct a mathematical model to describe the energy consumption, which includes server energy consumption and link energy consumption. The processing energy consumption of the server is directly proportional to the CPU utilization, and the transmission energy consumption of the link is proportional to the bandwidth utilization.

In this paper, the reward function of reinforcement learning algorithm is designed according to the network energy consumption model, the greater the energy consumption, the smaller reward value. Task offload strategy designed in this paper is responsible for selecting the deployable edge server, and the path between adjacent servers is determined by Dijkstra algorithm. Once a task is unloaded, the available resources (environment states) in the edge computing environment are updated.

The rest of this paper is organized as follows: In the second part, this paper briefly reviews related works. In the third part, we construct a mathematical model to describe the energy consumption of edge network, then constructs the problem as ILP model after considering various constraints. In the fifth part, the task unloading algorithm is simulated and the simulation results are analyzed. The last part summarizes the work of this paper and supplements the parts to be improved.

In edge computing environment, terminal devices can choose to upload some tasks to the edge server. Through the above operations, not only can the energy consumption of the device be reduced, but also the risk of privacy leakage in traditional cloud computing can be decreased, and the real-time performance of task processing can be improved. However, the existing literature focuses on reducing terminal energy consumption and improving the response speed of unloading tasks. Reference [

Dynamic voltage Scaling (DVS) can dynamically adjust voltage frequency to reduce energy consumption and ensure the quality of service of real-time tasks [

In this section, we first construct a task unloading model with energy consumption as the optimization goal in IoT. After considering the constraints of bandwidth, computing resources and traffic conservation, we get an optimization problem model.

In this paper, undirected graph

In this section, task queue on each terminal is modeled as a directed graph in this paper, and the tasks to be unloaded in the queue are not repeated. Each task queue can be described by a four tuple

The energy consumption model of edge network constructed in this paper includes the energy consumption of edge server and the energy consumption of physical link transmission traffic.

In addition to the energy consumption of computing tasks, the energy consumption of storage devices and communication devices on the edge server is also considerable. Therefore, the energy consumption of the server is modeled as the power consumption of the server starting up and processing the unloading task. The former is the energy required by the edge server to maintain its normal operation, which depends on whether there are tasks deployed on the edge server, regardless of the number of deployment tasks. The latter is positively correlated with CPU utilization. We use

Since the energy consumption of the edge server is positively related to the CPU utilization, the processing energy consumption of edge server is calculated as follows:

In the above formula,

Similarly, the physical energy consumption in edge network also includes the power consumption of switches on links and the transmission energy consumption when the link transmits the traffic between servers. The former depends on the power on state of the switch on the link, and latter depends on the bandwidth utilization of the physical link. In this section,

Based on the above analysis, the total energy consumption of

In this section, after considering the constraints of processing sequence of tasks in the queue, computing capacity constraints, and network bandwidth constraints, the unloading problem of terminal tasks is established as an ILP model with energy consumption as the optimization objective.

Firstly, to meet traffic constraints in the process of task unloading, this section assumes that a task sequence to be unloaded is

We assume that the tasks in the task sequence can only be unloaded on one edge server:

The tasks in sequences to be unloaded must be unloaded according to the dependency relationship between tasks. The mathematical formula is used to describe the following formula:

In addition, this section also considers the computing capacity constraints of edge servers and the bandwidth constraints of physical links.

In conclusion, the ILP model for Energy Efficient Task Offload (EETO) problem can be expressed as follows:

Markov chain is a probability model, the future state is only related to current states. Markov decision process (MDP) is a decision process based on Markov chain. MDP can be represented by a five tuple

An edge server in the edge network can be used as an agent to obtain the available resources and task unloading information of the edge network topology through the perceptron installed in physical network. In a certain state, the environment state after agent performs the action is only related to current state, independent of the historical state, and has no aftereffect. Therefore, the edge task unloading problem can be expressed as an MDP model. The problem of edge task offloading based on MDP is presented as follows:

In above formula, the source of action

The optimization goal of the MDP model established in this paper is to get an optimization strategy, it maximizes the goal of reinforcement learning-the expectation of cumulative return value:

Because the environment of edge computing network is complex and changeable, to learn in this challenging environment, it is necessary to use a reliable and scalable intelligent algorithm [

PPO algorithm is a deep reinforcement learning algorithm based on actor-critic framework. Its architecture contains two actor networks, Actor 1 and Actor 2. Actor 1 represents the latest policy

PPO algorithm optimizes the parameter

where

The policy update formula of PPO is shown in the above formula, but there is a problem that the super parameter

When policy does not change,

To optimize the energy consumption in the process of edge task unloading, algorithm designed in this paper mainly includes the following three modules: 1) construction of edge network environment and parameter setting; 2) edge task unloading model training; 3) output of energy consumption aware unloading scheme.

As mentioned in 4.2, the actor network of PPO algorithm designed in this paper is composed of two neural networks, Actor 1 and Actor 2. Actor 1 guides the agent to interact with the environment, obtains transfer samples and caches them. The policy parameters in Actor 2 represents the old policies. After a period of iteration, the parameters in Actor 1 will be used to update the parameters in Actor 2. The critic network consists of a neural network. Training steps of unloading model are as follows:

Based on above analysis, the edge task unloading algorithm based on PPO is described in the following table.

Algorithm: Edge task unloading algorithm based on PPO |
---|

In this section, two kinds of network topology are used to verify the algorithm proposed in the previous chapter. First network topology is composed of five edge servers and eight physical links. Second physical topology consists of 8 edge servers and 12 physical links. To verify the energy optimization performance of the task offload algorithm proposed in this paper, the energy consumption of 10, 15, 20 … 60 task queues on the terminal device is simulated and measured. We assume that three types of terminal tasks need to be unloaded at the edge network, and detailed parameters settings of each task are shown in the

Task type | Throughput (Mbps) | CPU demand |
---|---|---|

1 | 100 | 1 |

2 | 80 | 1 |

3 | 50 | 4 |

This paper uses the following four indicators to evaluate the proposed algorithm: 1.

To verify the effectiveness of the proposed algorithm. In this section, the algorithm (PPO_EM) proposed in the previous section is compared with random algorithm and task unloading algorithm based on PPO but without considering start up energy consumption (PPO_NEM).

PPO_EM is implemented on a computer equipped with inter (R) core i5-9300 h and 16 g memory. The program running environment is Python 3.7.4, Tensorflow 1.15.0;

In this paper, we focus on the optimization of energy consumption of edge server in the process of task unloading. To improve the accuracy of server energy consumption evaluation, we first construct a server energy consumption model including both startup energy consumption and processing energy consumption, then we describe the model as an optimization problem model for energy consumption optimization. Then, a task unloading strategy based on PPO is proposed to solve the approximate optimal task unloading scheme. Simulation results shows that compared with the random algorithm, the proposed algorithm can save 22.69% energy on average.