The solution strategy of the heuristic algorithm is pre-set and has good performance in the conventional cloud resource scheduling process. However, for complex and dynamic cloud service scheduling tasks, due to the difference in service attributes, the solution efficiency of a single strategy is low for such problems. In this paper, we presents a hyper-heuristic algorithm based on reinforcement learning (HHRL) to optimize the completion time of the task sequence. Firstly, In the reward table setting stage of HHRL, we introduce population diversity and integrate maximum time to comprehensively determine the task scheduling and the selection of low-level heuristic strategies. Secondly, a task computational complexity estimation method integrated with linear regression is proposed to influence task scheduling priorities. Besides, we propose a high-quality candidate solution migration method to ensure the continuity and diversity of the solving process. Compared with HHSA, ACO, GA, F-PSO, etc, HHRL can quickly obtain task complexity, select appropriate heuristic strategies for task scheduling, search for the the best makspan and have stronger disturbance detection ability for population diversity.

The development of cloud computing provides stable and efficient solutions for the information industry, which attracts many researchers to study application problems. The resources in a cloud environment are shared among users through virtualization technology [

Based on virtualization technology, cloud resource providers will create multiple Virtual Machines(VM) [

Currently, traditional algorithms or heuristic algorithms are generally used to achieve task scheduling in the cloud environment. However, the above algorithms still have two problems.

Traditional scheduling algorithms have the advantages of stable performance and easy implementation. However, large-scale task scheduling problems in complex cloud environments are often an NP-hard problem. Common non-heuristic task scheduling algorithms such as First Come First Serve (FCFS), max-min [

The task’s computational complexity, that is, the time consumed in the unit computing power, cannot be accurately estimated in practical application scenarios. The computational complexity of different tasks cannot be expressed by a single linear relationship, which makes it difficult for us to obtain the computational complexity as accurate as in the simulation environment. Moreover, task complexity is critical to the selection of underlying algorithm and the overall algorithm effect, especially in real application scenarios.

This work proposes a hyper-heuristic algorithm based on reinforcement learning (HHRL). HHRL uses a high-level heuristic based on a reward table updated with iterations. The makespan and the population diversity form the state information. PSO [

To accurately predict the computational complexity of tasks, this paper proposes a task complexity estimation method based on linear regression. Although it is difficult to estimate the computational complexity of different categories of tasks, there is an obvious linear relationship in the similar tasks. We record the execution of 100 tasks in each category and explore the linear relationship among them. The results show that the linear relationship exists obviously and can significantly improve task complexity estimation accuracy.

The remainder of the paper is organized as follows. The related work of task scheduling in cloud computing is given in Section 2. Section 3 provides the introduction of the scheduling problem in cloud environment and the relevant technologies of hyper-heuristic algorithms. Section 4 describes the proposed algorithm in detail. The experiment results, on both CloudSim and the real cloud server, are discussed in Section 5. Finally, Section 6 concludes the this work and presents the future research.

Effective scheduling of tasks submitted by users can effectively reduce the cost and reduce resource consumption of the cloud computing center. From rule-based algorithms to meta-heuristic algorithms, and then to hyper-heuristic algorithms, scheduling algorithm has a wider search range, and gradually has the evolutionary ability.

Plenty of scheduling algorithms is rule-based. Zhu et al. [

Representative heuristic algorithms are also used in task scheduling. Chaudhary et al. [

To get better solutions, Hybrid heuristic algorithms [

This section describes the task scheduling problem and the hyper-heuristic scheduling algorithm in Cloud Computing.

Cloud Computing Center virtualizes physical resources and establishes multiple VMs with different performances. In practical applications, since the creation and initialization of virtual machines will lead to unnecessary consumption of time and energy, users’ tasks are generally executed on existing VMs.

Part of the performance of

The essence of the task scheduling algorithm in cloud computing is to schedule the task to the appropriate virtual machine and complete the execution in less time. Since VMs and tasks have various characteristics, the results of different scheduling schemes will be significantly different. For example,

Cloud Computing System is composed of a large number of Physical Machines (PM).

where

where

where

The characteristics of the

where

Assuming that

The time finished upload (

The executing cost time (

Assuming that

The download cost time (

The Expect Finish Time (

Assuming that

A Feasible Solution (

In 2001, Cowling proposed the Hyper-heuristic algorithm and used it to solve the scheduling problem. Cowling described the Hyper-heuristic algorithm as the heuristic algorithm to find the heuristic algorithm. With the development of research, Hyper-heuristic algorithms are used to manage or manipulate a series of low-level heuristic algorithms to solve various combinatorial optimization problems [

Hyper heuristic algorithm | Heuristics algorithm | |
---|---|---|

Search space | Set of low level heuristic algorithm | Solution space of problem |

Professional knowledge | Less or no problem domain knowledge is required | Knowledge of intelligent computing and problem domain is required |

Scope of application | Can be applied to different problems | To solve new problems, redesign is generally needed |

Generally speaking, the goal of the Hyper-heuristic algorithm aims to extend the intelligent computing technology to more fields and reduce the difficulty of heuristic algorithm design. A typical Hyper-heuristic algorithm consists of a control domain and problem domain. The problem domain contains the constraints, basic functions, evaluation functions and low-level heuristics (LLH) designed by domain experts. The high-level heuristic in the control domain is designed by the hyper heuristics experts, including how to use the low-level heuristics to construct feasible solutions or improve the quality of solutions. The standard interface between the problem domain and the control domain is defined for information transfer between two layers.

The task scheduling application of generally Hyper-heuristics algorithm in cloud computing is shown in

The flow of the whole algorithm is shown in

In this section, we present the hyper-heuristic algorithm based on reinforcement learning to solve the task scheduling problem in cloud computing. Moreover, task computational complexity estimation method is used to accurately obtain the execution cost time of the task.

To learn a high-level heuristic method based on current state information, HHRL uses makespan and population diversity to construct reward-table. The pseudocode of HHRL is shown in Algorithm 2.

Action set is composed of ACO, GA, F-PSO and PSO. Although the particle swarm can quickly find the solution, the quality is not stable. Fuzzy-PSO improves the efficiency of conventional PSO by using fuzzy logic systems, and it's less convergent. Although ant colony algorithm can search the optimal solution quickly, it is also easy to fall into local optimal. Because the sequence of candidate solutions generated by GA is very suitable for representing task scheduling, GA has a good effect in task scheduling, which is also proved by experiments.

The state information consists of makespan and the population diversity. The calculation method of makespan is introduced in Section 3. Because in meta-heuristic algorithm, other candidate solutions will be fitted to the optimal solution. The difference between the optimal solution and other candidate solutions can indicate the diversity of the current candidate solution set. In HHRL, the population diversity is represented by the Hamming distance between the optimal solution and other candidate solutions. The set

The high-level heuristic method selects a low-level heuristic algorithm (LLH). After the iterations of the LLH, the execution result of the current algorithm can be obtained. The reward value is consists of two parts. The first part of the reward value determined by the makespan of the current iteration result and the makespan obtained from the previous iteration. The reward value of the current LLH iteration can represent as:

When selecting the LLH, the high-level heuristic is determined by the reward value of the current reward-table under the state information. LLH with the highest reward value will have a 50% probability to be selected, and other actions will be selected randomly.

The candidate solution

Generally speaking, the computational complexity of the heuristic algorithm is , where represents the iteration round of the algorithm,

The algorithm proposed in this paper is solved by the selected

The essence of the task scheduling algorithm in cloud computing is to schedule the task to the appropriate virtual machine and complete the execution in a short time. For example, CPU intensive tasks ought to be assigned to high information processing speed resources, but I/O-intensive tasks are more suitable for assignment to resources with large bandwidth.

In the simulation environment, the characteristics of the tasks can be accurately obtained. However, although the category of task and the size of relevant data can be accurately obtained in the real application environment, its computational complexity can’t be accurately estimated. Because the computational complexity of a task is related to the task category, it is not linearly related to the size of relevant data. Computational complexity is vital priority information in task scheduling. Inaccurate computational complexity will lead to the degradation of scheduling algorithm performance and failure to obtain high-quality scheduling results. A method to estimate the complexity of three common cloud computing tasks is proposed. For a particular category of task, its computational complexity is often related to the relevant data's size. The regression formula of task complexity can represent as:

In this section, the proposed method HHRL is compared with other traditional and meta-heuristic algorithms. The empirical results in simulation environment Cloudsim and real tasks show that the proposed HHRL keeps the population diversity effectively and outperforms other algorithms.

In this experiment, CloudSim 4.0 is used to test the algorithms. The performance of HHRL is evaluated and compared with existing meta-heuristics algorithms such as PSO, F-PSO, GA and ACO, and traditional algorithms such as FIFO and Max-min. The parameters of meta-heuristic algorithms are given in

Algorithm | Parameters | Value |
---|---|---|

HHRL | The iterations of LLH | 50 |

Population size | 100 | |

The max iterations of LLH | 50 | |

HHSA | The max iterations of no-improved LLH | 5 |

Threshold of diversity | 2500 | |

GA | Crossover problem | 0.8 |

Mutation rate | 0.01 | |

Inertia weight | 0.8 | |

PSO | Cognitive coefficient |
1.0 |

Cognitive coefficient |
1.0 | |

Pheromone residue coefficient | 0.5 | |

ACO | Information elicitation factor | 0.5 |

Relative influence weights | 1 |

Entity | Parameters | Value |
---|---|---|

Cloudlet | No of Cloudlets | 500–5000 |

length | 500–15000 | |

Virtual machine | No of VM | 10 |

RAM | 1024–4096 MB | |

MIPS | 500–5000 | |

Bandwidth | 1000–5000 | |

Policy type | Time shared | |

VMM | Xen | |

Operation system | Linux | |

No of cores | 1–2 | |

Physical machine | No of PM | 2 |

RAM | 20480 MB | |

Storage | 100000 MB | |

Bandwidth | 20000 | |

Policy type | Time shared |

The result of each experiment is the makespan of the current scheduling scheme. The average result and best result obtained by the traditional algorithms and the proposed HHRL algorithm after 30 times are given in

No. | Algorithm | HHRL | HHSA | GA | ACO | F-PSO | PSO | FIFO | Max-min |
---|---|---|---|---|---|---|---|---|---|

500 | Best | 158.43 | 157.50 | 188.47 | 218.52 | 162.85 | |||

Average | 159.42 | 160.21 | 163.40 | 162.83 | 203.90 | 218.52 | 162.85 | ||

1000 | Best | 300.89 | 302.07 | 313.28 | 310.77 | 382.58 | 427.42 | 324.62 | |

Average | 303.16 | 307.04 | 317.72 | 321.18 | 395.81 | 427.42 | 324.62 | ||

3000 | Best | 962.85 | 973.24 | 992.86 | 1000.85 | 1022.67 | 1169.32 | 1011.53 | |

Average | 966.73 | 978.17 | 1002.39 | 1011.61 | 1036.27 | 1169.32 | 1011.53 | ||

5000 | Best | 1600.29 | 1600.02 | 1623.83 | 1632.36 | 1681.83 | 1826.54 | 1674.63 | |

Average | 1603.92 | 1607.43 | 1633.69 | 1646.41 | 1697.83 | 1826.54 | 1674.63 |

As shown in

In the iteration of the meta-heuristic algorithm, the population diversity significantly impacts the optimal result. With the continuous fitting of candidate solutions to the optimal solutions, the population diversity will decrease. If the population diversity decreases too fast, it is easy to fall into local optimum, and better candidate solutions cannot be obtained. The variation of population diversity of HHRL and other meta-heuristic algorithms is shown in

From this result, it can be observed that the population diversity of PSO and F-PSO will gradually decrease, then it tend to be stable and maintain at a high level. The high population diversity means that PSO falls into local optimum in many places, and no high-quality scheduling results are found. On the contrary, the population diversity of ACO and GA can get convergence and better scheduling results. However, the set of candidate solutions is completely fitted to the optimal solution, which leads to serious local optimization. HHSA has a diversity detection operator. When the population diversity is lower than the preset threshold, disturbance information will be introduced to improve the population diversity. As a result, the population diversity varies dramatically. HHRL rewards the population diversity, which effectively avoids the excessive convergence or violent fluctuation of population diversity. This ensures the balance between search range and algorithm efficiency.

The population diversity as HHRL state information directly affects the selection of LLH. To analyze the influence of the population diversity on LLH selection, this work counted the 6,000 selections of LLH obtained from 30 runs of HHRL, and the results obtained are shown in

In the experiment, there were 1126 high, 2856 medium and 2018 low. When population diversity is in a high state, GA is selected most, while PSO and F-PSO are rarely selected. At the beginning of HHRL iteration, the increase of the population diversity leads to small reward value, while the decrease of makespan can bring more reward value. Therefore, GA, which can significantly reduce makespan, is selected multiple times. On the contrary, when population diversity is in a low state, PSO and F-PSO are selected more times. Because PSO and F-PSO can effectively maintain the population diversity, it can bring large reward value.

In the simulation environment, the computational complexity of the task is considered to be accurately obtained. But in practical application, because of the different categories of tasks, it is not easy to estimate the complexity of tasks in practical application scenarios. This significantly affects the scheduling effect. Their computational complexity does not show a simple linear relationship with the size of related data for all tasks because the task categories are different. For example, training the neural network model with a data set of 2000 MB on the virtual machine consumes significantly more computing power than the only test. However, there is a linear relationship between the data size and the computational complexity for a certain task category. This makes it possible for us to estimate the computational complexity. To verify the effect of this method on improving scheduling effect, linear regression analysis was carried out on three categories of tasks: CNN model training, CNN model testing and RNN model testing. The experimental environment is shown in

Configuration | |
---|---|

CPU | Intel Core i5-8500 3.00 GHz |

GPU | Nvidia GeForce GTX 1080Ti |

RAM | 16G |

Operating System | Linux |

Storage | 1 TB |

Policy | Time Shared |

Slope | RMSE | R-square | P-value | ||
---|---|---|---|---|---|

Category 1 | CNN model training | 2.345 | 54.5 | 0.599 | 1.74e-10 |

Category 2 | CNN model testing | 0.501 | 8.73 | 0.833 | 3.84e-37 |

Category 3 | RNN model testing | 0.290 | 5.53 | 0.819 | 1.67e-18 |

CNN model includes a classification model and a generative model based on VGG-16 and Resnet-34. RNN model includes text generation based on GRU. As shown in

The experimental steps are as follows. Firstly, the execution time of 100 tasks in each category on the host is obtained, which is called complexity 1, and the linear relationship between the computational complexity of different categories of tasks and the size of relevant data is obtained. Then the computational complexity of all tasks will be predicted, which is called complexity 2. In addition, there is a complexity 3 based on the size of the relevant data. In Cloudsim, scheduling result 1 based on complexity 3 and scheduling result 2 based on complexity 2 are obtained. It should be noted that complexity 1 is the real computational power consumption, while complexity 2 and complexity 3 are estimated. Two scheduling results are obtained through complexity2 and complexity 3, but the real makespan of the two scheduling results still needs to be obtained according to complexity 1. The average result and best result of 30 times between the traditional algorithms and the proposed HHRL algorithm are given in

HHRL | HHSA | GA | ACO | F-PSO | PSO | FIFO | Max-min | ||
---|---|---|---|---|---|---|---|---|---|

Result 1 | Best | 82.57 | 82.04 | 83.51 | 82.63 | 82.60 | 82.78 | 94.56 | 91.57 |

Average | 88.12 | 89.70 | 91.06 | 90.46 | 93.36 | 93.47 | 94.56 | 91.57 | |

Result 2 | Best | 82.16 | 82.58 | 94.38 | 84.39 | ||||

Average | 85.80 | 86.22 | 87.75 | 89.42 | 92.37 | 94.38 | 84.39 |

As shown in

From the experiment results in this section it can be concluded that, the proposed HHRL algorithm get better results than the traditional scheduling algorithms. Experiments in cloudsim show that HHRL can effectively reduce makespan and improve the stability of the scheduling scheme. This improvement comes from the reward for the improvement of population diversity. The results of real tasks show that makespan can be effectively improved after the task complexity is estimated, and the experimental conclusion is similar to that in the simulation environment.

This work proposed a hyper-heuristic algorithm based on reinforcement learning (HHRL). HHRL obtained reward value by makespan and population diversity. The action set of HHRL was formed by four mata-heuristic algorithms-GA, ACO, PSO, and F-PSO. By rewarding the decrease of makespan and improving population diversity, HHRL kept the population diversity and got a better scheduling scheme. This work also proposed a task computational complexity estimation method in a cloud environment by linear regression to reduce task complexity estimation error. Firstly, we carried out experiments from 500 to 5000 tasks in Cloudsim. The results of 30 times executed of each algorithm showed that HHRL could get better average results and optimal results than other algorithms. This result also showed that the variance of HHRL was the smallest, which indicated that the solution effect of HHRL was the most stable. By analyzing population diversity, we found that HHRL can effectively maintain population diversity by rewarding the improvement of population diversity. Simultaneously, the relationship between the selected frequency of LLH and the population diversity was also explained. On the other hand, the task complexity estimation method proposed in this work was verified to be effective. Experiments show that after the estimation of this method, the effects of each scheduling algorithms are significantly improved.

In the future, we plan to study a hyper-heuristic scheduling algorithm that optimizes multi-objective such as load balancing, energy consumption, and user expenditure in a cloud environment. We also plan to implement another HHRL version that can schedule workflow. At the same time, we are also committed to research a method to achieve synchronous fitting of various task computational complexity in the iterative process of the scheduling algorithm.

No further acknowledgments to provide than those already listed in the authors’ section. In addition, the authors thank the anonymous reviewers for providing valuable comments to improve this paper.

This work was supported in part by the National Key R & D Program of China under Grant 2017YFB1302400, the Jinan “20 New Colleges and Universities” Funded Scientific Research Leader Studio under Grant 2021GXRC079, the Major Agricultural Applied Technological Innovation Projects of Shandong Province under Grant SD2019NJ014, the Shandong Natural Science Foundation under Grant ZR2019MF064, and the Beijing Advanced Innovation Center for Intelligent Robots and Systems under Grant 2019IRS19.

The authors declare that they have no conflicts of interest to report regarding the present study.