Unmanned Aerial Vehicles (UAVs) or drones introduced for military applications are gaining popularity in several other fields as well such as security and surveillance, due to their ability to perform repetitive and tedious tasks in hazardous environments. Their increased demand created the requirement for enabling the UAVs to traverse independently through the Three Dimensional (3D) flight environment consisting of various obstacles which have been efficiently addressed by metaheuristics in past literature. However, not a single optimization algorithms can solve all kind of optimization problem effectively. Therefore, there is dire need to integrate metaheuristic for general acceptability. To address this issue, in this paper, a novel reinforcement learning controlled Grey Wolf Optimisation-Archimedes Optimisation Algorithm (QGA) has been exhaustively introduced and exhaustively validated firstly on 22 benchmark functions and then, utilized to obtain the optimum flyable path without collision for UAVs in three dimensional environment. The performance of the developed QGA has been compared against the various metaheuristics. The simulation experimental results reveal that the QGA algorithm acquire a feasible and effective flyable path more efficiently in complicated environment.

Unmanned Aerial Vehicles (UAVs) are an evolving aerospace technology with immense potential for numerous applications. Even though formerly devoted for defence applications, the range of applications of UAVs have been extended to several commercial and domestic fields. Disaster management and disaster zone mapping [

The path planning of UAVs in 3D environment can be formulated as an optimisation problem, for which the fitness function can be designed and an optimisation technique can be devised to obtain the optimum solution path. The objective of path planning is to construct a collision-free flight route for the UAV from the start to the target position where it is deployed. Along with the collision avoidance, other constraints, such as energy consumption and time required to accomplish the desired path are also considered to frame the objective function. Throughout the literature, metaheuristic algorithms prove to be successful optimisation techniques for several engineering optimisation problems, due to their advantages such as ease of implementation and simplicity [

For 3D path planning of UAVs, Particle Swarm Optimisation (PSO) and Global Best path competition (GBPSO) have been exhausted in past [

Moreover, with the advent of Machine Learning (ML), enormous work has been extensively reported to elevate the optimisation capabilities of metaheuristics. In this sequence, Reinforcement Learning (RL) has been commonly utilised to control the exploration and exploitation behaviours [

Therefore, the present work investigates the possibilities to develop a novel metaheuristic by employing AOA and GWO. Further, the RL based algorithm selection mechanism has been developed in order to reach the new horizons with proposed metaheuristics. In AOA, the capability to optimize any real time optimization problem has been controlled by transfer operator that maintains the global-local search balance. The transfer operator assures that the first 30% iterations are exploration, by updating the individual positions based on the position of a randomly selected individual. During the next phase, the exploitation, the positions of the individuals are updated utilising the position of the best individual. However, the low convergence speeds of AOA is the consequence of absence of exploitation during the first one-third iterations. Additionally, the previous literature comprehensively employed the use of GWO for UAVs path planning because of its higher convergence speed and capability to generate optimum path with obstacle [

To effectively address the above mentioned issues, the Q learning controlled GWO-AOA (QGA) has been proposed in the present work. The exploitation capabilities of GWO and the exploration abilities of AOA have been combined in the introduced algorithm. For maintaining the balance between global and local search,

The performance of the developed QGA has been validated against 22 popular benchmark functions and then, employed to estimate the optimum path for UAVs. For this purpose, the results obtained for the benchmark functions have been compared with GWO, AOA, RL based GWO (RLGWO) [

Development of Q-controlled GWO-AOA (QGA) to enhance the exploration-exploitation balance.

Development of algorithm selection mechanism to efficiently select the employed metaheuristics based on Q-State parameters.

Exhaustive validation of developed QGA against other metaheuristics for various benchmark functions based on statistical tests.

Examining the employability of QGA for real-time path-planning optimization problem of UAV in 3D environment.

The rest of the paper is organised as follows. Section 2 includes the brief description about Q learning, GWO, and AOA. The detailed description of the proposed QGA is presented in Section 3. The path planning problem statement is formulated in Section 4. The performance analysis of QGA over benchmark functions and the effectiveness of proposed algorithm for path planning in 3D flying field is presented in Section 5. Lastly, Section 6 concludes the work with discussions on future research.

RL comprises four sub-elements, namely, policy, reward signal, a value function, and a model of the environment. RL depends on the interactivity between the learning agent and its environment. Reward signal is the feedback from the environment to the RL agent which quantifies this interactivity. Q learning is an off policy Temporal Difference (TD) method. Q indicates the action-value function. One of the main components of Q learning is the Q table. The reward is the reinforcement signal which guides the learning agent during the process. The main objective of the learning agent is to maximise the reward signal. The Q value is dependent on the reward signal. During each operation, the Q table is updated based on the

where,

The Archimedes Optimisation Algorithm (AOA) is based on Archimedes’ principle that defines the relation between buoyant forces or upthrust and the weight of the displaced fluid. According to Archimedes’ principle, if a body is completely or partially immersed in a liquid, the net upward force, also known as the buoyant force acting on the body will be equal to the weight of the liquid displaced by the body. If the weight of the displaced liquid is greater than the upward force, the body will be completely immersed in the liquid whereas, if the weight of the body is equal to the weight of the displaced liquid, the body will be floating in the liquid. The initial population of AOA consists of various bodies with different densities and volumes. Each of these bodies will try to achieve an equilibrium state with neutral buoyancy. This can be mathematically represented by

where,

To obtain the acceleration of the body,

If other forces such as collision with other bodies are affecting the body under consideration,

where,

The initial position (

where,

The density factor and transfer operator helps in maintaining exploration-exploitation balance in AOA. During first 30% of the iterations, AOA is in global search followed by the local search. The exploration and exploitation phases are defined using transfer operator (

where,

During each iteration the density (

where,

Further, if

Then, the acceleration,

The position of

where,

Moreover, if

where,

Afterwards, if

Additionally, based upon the constant (

The Grey Wolf Optimisation (GWO) is inspired from the hunting behaviours and social hierarchy of grey wolf packs. Grey wolves live in groups called packs and maintain a strict social hierarchy among the group. The topmost leader of the group is called the

where,

GWO and AOA are both population based algorithms. In the case of AOA, the exploration and exploitation behaviour is selected by

The Q learning is a value based RL technique where, Q denotes the action-value function. In this work, two actions and states are defined, AOA and GWO. The Q table contains the Q values for each state-action pair. The training agent and the environment are represented by the individuals of the population and the search space, respectively. During each operation, the Q table is updated using

The agent decides the next action based on the Q table values, which are further depending on the reward signal. By selecting the action with the maximum Q value during the exploitation operation, the agent maximizes its reward. When an operation results in a better cost function value, a positive reward is assigned, otherwise, the Q agent is penalised by a negative reward. If

The present work attempts to formulate the NP Hard problem such as UAV path planning in 3D environment as an optimisation problem and metaheuristics is found as the most effective approach to solve such problems. The prime objective of path planning for UAV is to find an optimal flyable path from source to destination without collisions in minimum time and energy. To fulfill these requirements, the cost function is formulated based on [

where,

The cost function to be minimised for the optimal flight trajectory from start to target position is formulated considering the collisions with the obstacles in the environment, energy consumption, and travel time. The collision cost,

Since the length of the path is directly proportional to the energy consumption, the energy cost (

The deviation cost,

where,

A real world 3D UAV flight environment will contain several threats. Hence, in order to prove the efficiency of the developed algorithm, a simulation environment that resembles the real world civilian environment is required. Therefore, for simulating the results a 3D flight environment consisting of different kinds of obstacles is created. The 3D environment constitutes a Gaussian obstacle, cone, hemispheres, and cylinders of various sizes. The configurations of the simulated obstacles are presented in

Obstacle | Base center | Radius (m) | Height (m) |
---|---|---|---|

Gaussian | (0, 0, 0) | - | 7 |

Cone | (−17, 7, 0) | 2.5 | 10 |

Hemispheres | (6, 7, 0) | 5 | - |

(20, 16, 0) | 5 | - | |

Cylinders | (12, 0, 0) | 3.5 | 8 |

(22, 0, 0) | 5 | 12 | |

(25, 25, 0) | 2.3 | 9 | |

(15, 8, 0) | 2 | 10 | |

(10, 15, 0) | 4 | 8 |

The performance of the developed QGA is evaluated on 22 benchmark functions as mentioned in

Function | Parameter | RLGWO | GWO | AOA | QGA | PSO | EAOA |
---|---|---|---|---|---|---|---|

F1 | Mean | 9.98E − 34 | 9.95E − 59 | 7.82E − 14 | 1.04E − 06 | 4.36E + 04 | |

SD | 4.23E − 33 | 6.79E − 59 | 3.67E − 13 | 4.49E − 72 | 3.63E − 06 | 1.36E + 04 | |

F2 | Mean | 1.47E − 14 | 1.10E − 34 | 3.92E − 05 | 8.87E + 00 | 2.47E + 11 | |

SD | 6.06E − 14 | 2.48E − 34 | 1.84E − 04 | 1.68E − 41 | 8.29E + 00 | 8.12E + 11 | |

F3 | Mean | 6.15E − 03 | 8.77E − 16 | 2.92E − 25 | 1.53E + 03 | 9.63E + 04 | |

SD | 2.74E − 02 | 1.41E − 14 | 1.37E − 24 | 3.49E − 52 | 2.84E + 03 | 3.36E + 04 | |

F4 | Mean | 1.56E − 04 | 1.44E − 14 | 3.84E − 23 | 2.37E − 29 | 1.29E + 00 | |

SD | 3.94E − 04 | 2.60E − 14 | 1.78E − 22 | 7.31E − 29 | 7.79E − 01 | 0.00E + 00 | |

F5 | Mean | 2.80E + 01 | 2.70E + 01 | 2.75E + 01 | 2.25E + 04 | 1.67E + 08 | |

SD | 2.74E + 00 | 4.75E − 01 | 8.80E − 01 | 8.03E − 01 | 4.00E + 04 | 7.63E + 07 | |

F6 | Mean | 2.89E + 00 | 2.96E + 00 | 1.51E + 00 | 4.95E + 02 | 5.04E + 04 | |

SD | 6.41E + 00 | 3.40E − 01 | 1.03E + 00 | 4.49E − 01 | 2.21E + 03 | 1.36E + 04 | |

F7 | Mean | 2.21E − 03 | 8.41E − 04 | 5.02E − 02 | 7.01E − 01 | 2.47E + 01 | |

SD | 2.61E − 03 | 3.44E − 04 | 1.53E − 01 | 4.28E − 04 | 3.00E + 00 | 2.82E + 01 | |

F8 | Mean | 1.72E + 01 | 9.36E + 01 | 1.14E − 01 | 1.44E + 02 | 3.56E + 02 | |

SD | 3.30E + 01 | 2.33E − 14 | 3.67E + 01 | 5.12E − 01 | 2.90E + 01 | 5.63E + 01 | |

F9 | Mean | 3.28E − 06 | 1.62E − 14 | 1.33E − 01 | 3.13E + 00 | 2.00E + 01 | |

SD | 1.46E − 05 | 2.15E − 15 | 6.23E − 01 | 2.60E − 15 | 3.96E + 00 | 7.69E − 01 | |

F10 | Mean | 6.01E − 02 | 2.58E − 03 | 8.57E − 04 | 4.61E + 00 | 4.50E + 02 | |

SD | 2.26E − 01 | 2.46E − 03 | 4.02E − 03 | 4.93E − 03 | 2.03E + 01 | 1.20E + 02 | |

F11 | Mean | 7.75E − 02 | 1.97E + 00 | 8.78E − 02 | 2.08E − 01 | 3.97E + 08 | |

SD | 4.30E − 02 | 3.30E − 02 | 4.53E + 00 | 4.36E − 02 | 4.55E − 01 | 2.04E + 08 | |

F12 | Mean | 1.18E + 00 | 3.60E + 00 | 1.21E + 00 | 4.20E − 01 | 7.38E + 08 | |

SD | 3.28E − 01 | 2.12E − 01 | 2.40E + 00 | 2.71E − 01 | 1.38E + 00 | 4.45E + 08 | |

F13 | Mean | 4.96E + 00 | 5.06E + 00 | 4.91E + 00 | 6.21E + 00 | 4.64E + 00 | |

SD | 4.62E + 00 | 4.88E + 00 | 4.35E + 00 | 5.22E + 00 | 6.11E − 01 | 3.05E + 00 | |

F14 | Mean | 8.10E − 04 | 1.33E − 03 | 3.88E − 03 | 4.01E − 04 | 6.81E − 03 | 4.12E − 03 |

SD | 4.39E − 04 | 9.38E − 03 | 7.43E − 03 | 9.13E − 03 | 4.52E − 03 | ||

F15 | Mean | −1.03E + 00 | −9.82E−01 | −1.02E + 00 | |||

SD | 2.40E − 09 | 6.04E − 02 | 2.36E − 03 | 1.20E − 09 | 2.28E − 16 | 1.90E − 02 | |

F16 | Mean | 3.98E − 01 | 3.87E − 01 | 4.39E − 01 | |||

SD | 2.21E − 07 | 0.00E + 00 | 3.74E − 02 | 2.49E − 07 | 0.00E + 00 | 4.19E − 02 | |

F17 | Mean | 2.86E + 00 | 3.35E + 00 | ||||

SD | 1.06E − 05 | 0.00E + 00 | 0.00E + 00 | 5.21E − 06 | 1.04E − 15 | 7.82E − 01 | |

F18 | Mean | −3.86E + 00 | −3.86E + 00 | −3.68E + 00 | −3.74E + 00 | −3.71E + 00 | |

SD | 1.67E − 03 | 2.40E − 03 | 2.42E − 03 | 3.03E − 03 | 2.96E − 01 | 1.33E − 01 | |

F19 | Mean | −3.28E + 00 | −3.08E + 00 | −3.28E + 00 | −3.20E + 00 | −2.12E + 00 | |

SD | 6.85E − 02 | 7.64E − 02 | 9.81E − 02 | 5.93E − 02 | 1.12E − 01 | 5.07E − 01 | |

F20 | Mean | −8.76E + 00 | −9.14E + 00 | −5.39E + 00 | −7.35E + 00 | −6.76E + 00 | −2.40E + 00 |

SD | 2.21E + 00 | 1.56E + 00 | 3.76E + 00 | 2.60E + 00 | 3.57E + 00 | 1.91E + 00 | |

F21 | Mean | −4.45E + 00 | −8.81E + 00 | −7.07E + 00 | −2.98E + 00 | ||

SD | 4.48E − 04 | 4.81E + 00 | 2.98E + 00 | 2.50E + 00 | 3.21E + 00 | 1.61E + 00 | |

F22 | Mean | −4.67E + 00 | −1.00E + 01 | −7.02E + 00 | −2.61E + 00 | ||

SD | 7.02E − 04 | 2.26E − 04 | 3.40E + 00 | 1.66E + 00 | 4.00E + 00 | 1.50E + 00 | |

Average rank | 2.95 | 2.5 | 3.77 | 2.31 | 4.0 | 5.45 | |

Total rank | 3 | 2 | 4 | 1 | 5 | 6 |

Further, the optimum value for each benchmark are computed against the simulated runs and tabulated in

Function | RLGWO | GWO | AOA | QGA | PSO | EAOA |
---|---|---|---|---|---|---|

F1 | 4.11E − 48 | 3.79E − 61 | 8.12E − 127 | 6.24E − 18 | 2.12E + 04 | |

F2 | 4.23E − 30 | 7.23E − 36 | 6.77E − 70 | 2.00E − 02 | 4.22E + 01 | |

F3 | 3.08E − 13 | 1.41E − 19 | 1.58E − 100 | 1.36E + 00 | 3.90E + 04 | |

F4 | 3.10E − 10 | 1.39E − 15 | 3.11E − 50 | 4.97E − 77 | 3.36E − 01 | |

F5 | 2.63E + 01 | 2.61E + 01 | 2.60E + 01 | 2.62E + 01 | 1.55E + 07 | |

F6 | 4.90E − 01 | 8.37E − 01 | 5.01E − 01 | 4.22E − 13 | 2.67E + 04 | |

F7 | 6.33E − 04 | 5.33E − 04 | 7.30E − 03 | 1.56E − 02 | 6.04E + 00 | |

F8 | 4.63E + 01 | 8.95E + 01 | 2.42E + 02 | |||

F9 | 2.22E − 14 | 1.51E − 14 | 4.00E − 14 | 1.15E − 06 | 1.82E + 01 | |

F10 | 5.55E − 16 | 8.11E − 11 | 2.72E + 02 | |||

F11 | 3.48E − 02 | 6.60E − 03 | 9.10E − 02 | 3.40E − 02 | 1.02E + 08 | |

F12 | 9.62E − 01 | 1.76E − 01 | 2.17E + 00 | 8.14E − 01 | 4.71E + 07 | |

F13 | 1.00E + 00 | |||||

F14 | 3.07E − 04 | 3.07E − 04 | 3.07E − 04 | |||

F15 | ||||||

F16 | ||||||

F17 | ||||||

F18 | −3.85E + 00 | −3.69E + 00 | ||||

F19 | −2.91E + 00 | |||||

F20 | −8.66E + 00 | |||||

F21 | −5.84E + 00 | |||||

F22 | −7.74E + 00 |

The

Function | QGA |
QGA |
QGA |
QGA |
QGA |
---|---|---|---|---|---|

F1 | 1.01E−07 | 6.8E−08 | 0.008352 | 6.8E−08 | 6.77E−08 |

F2 | 6.8E−08 | 6.8E−08 | 0.031517 | 6.8E−08 | 6.8E−08 |

F3 | 6.8E−08 | 6.8E−08 | 0.113551 | 6.8E−08 | 6.8E−08 |

F4 | 6.8E−08 | 6.8E−08 | 0.076431 | 6.8E−08 | 6.8E−08 |

F5 | 0.243594 | 0.147847 | 0.000758 | 6.8E−08 | 0.903116 |

F6 | 1.2E−06 | 1.58E−06 | 2.6E−05 | 6.8E−08 | 0.967635 |

F7 | 6.8E−08 | 0.06388 | 6.8E−08 | 6.8E−08 | 0.009786 |

F8 | 1.51E−08 | 0.499262 | 1.51E−08 | 1.51E−08 | 2.32E−06 |

F9 | 4.37E−08 | 2.56E−07 | 4.19E−08 | 4.37E−08 | 4.31E−08 |

F10 | 2.25E−07 | 0.573704 | 5.21E−06 | 1.51E−08 | 0.000466 |

F11 | 0.060111 | 0.000179 | 1.58E−06 | 6.8E−08 | 0.560852 |

F12 | 1.2E−06 | 1.43E−07 | 1.06E−07 | 6.8E−08 | 0.579218 |

F13 | 1.93E−07 | 0.281366 | 0.439459 | 0.903112 | 0.860405 |

F14 | 0.132627 | 0.029084 | 0.013273 | 1.43E−07 | 0.000563 |

F15 | 8.01E−09 | 2.78E−07 | 1.51E−08 | 6.8E−08 | 0.001349 |

F16 | 8.01E−09 | 8.01E−09 | 1.13E−08 | 6.8E−08 | 8.29E−05 |

F17 | 4.91E−08 | 2.99E−08 | 2.99E−08 | 6.8E−08 | 0.839232 |

F18 | 0.388414 | 0.175637 | 0.001944 | 6.8E−08 | 0.285305 |

F19 | 0.283263 | 0.143627 | 0.635038 | 6.8E−08 | 0.049864 |

F20 | 0.421973 | 0.207791 | 0.187628 | 1.57E−06 | 0.189523 |

F21 | 0.559911 | 0.19835 | 1.99E−05 | 9.13E−07 | 0.776391 |

F22 | 0.594823 | 0.009473 | 0.000196 | 9.17E−08 | 0.096196 |

Finally, the performance of all the employed algorithms is also compared based on convergence plots and sample of these are presented in

The developed algorithm is employed to generate 3D flight path in the simulated environment. The maximum number of iterations utilised is 1500, with a population size of 50, for all the algorithms. The results are compared with GWO, AOA, EAOA, RLGWO, and PSO. The

By comprehensive analysis of the convergence curves shown in

S. No. | Algorithms | Best cost function value | Mean Value | Standard Deviation |
---|---|---|---|---|

1 | PSO [34] | 145.44 | 200.87 | 6.62E-01 |

2 | GWO [37] | 142.13 | 180.52 | 5.24E-02 |

3 | AOA [22] | 116.84 | 180.46 | 8.51E-01 |

4 | EAOA [33] | 265.90 | 296.40 | 5.02E-02 |

5 | RLGWO [27] | 110.40 | 123.59 | 6.38E-02 |

6 | QGA | 104.00 | 118.26 | 6.22E-02 |

In the present work, a novel Q learning control GWO-AOA (QGA) has been proposed and validated on 22 different benchmark functions based on convergence curves, Freidman test, and Wilcoxon rank-sum test. Additionally, the employability of QGA has been investigated for path planning of UAVs in three dimensional environment. The experimental results reveals that the adaptive algorithm selection mechanism based on Q learning enables QGA to achieve better exploration-exploitation capabilities because of which it dominates all the compared algorithms for both benchmark evaluations and path planning of UAVs. Moreover, the cubic-spline curves has been utilized to smooth the generated flight path for various adopted algorithms. The employability and superiority of the compared algorithms has been comprehensively endorsed on the basis various statistical results and convergence curve analysis. Though, the developed QGA achieved optimum cost for both benchmark and path planning of UAVs, it is not the fastest one which open the window for the future research.

The authors extend their appreciation to Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R66), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.