In current research on task offloading and resource scheduling in vehicular networks, vehicles are commonly assumed to maintain constant speed or relatively stationary states, and the impact of speed variations on task offloading is often overlooked. It is frequently assumed that vehicles can be accurately modeled during actual motion processes. However, in vehicular dynamic environments, both the tasks generated by the vehicles and the vehicles’ surroundings are constantly changing, making it difficult to achieve real-time modeling for actual dynamic vehicular network scenarios. Taking into account the actual dynamic vehicular scenarios, this paper considers the real-time non-uniform movement of vehicles and proposes a vehicular task dynamic offloading and scheduling algorithm for single-task multi-vehicle vehicular network scenarios, attempting to solve the dynamic decision-making problem in task offloading process. The optimization objective is to minimize the average task completion time, which is formulated as a multi-constrained non-linear programming problem. Due to the mobility of vehicles, a constraint model is applied in the decision-making process to dynamically determine whether the communication range is sufficient for task offloading and transmission. Finally, the proposed vehicular task dynamic offloading and scheduling algorithm based on muti-agent deep deterministic policy gradient (MADDPG) is applied to solve the optimal solution of the optimization problem. Simulation results show that the algorithm proposed in this paper is able to achieve lower latency task computation offloading. Meanwhile, the average task completion time of the proposed algorithm in this paper can be improved by 7.6% compared to the performance of the MADDPG scheme and 51.1% compared to the performance of deep deterministic policy gradient (DDPG).

With the evolution of information and communication technology and the rapid iteration of technological industrial changes, the integration of emerging technologies such as the Internet of Vehicles, artificial intelligence, and 5G technology is accelerating towards intelligent and networked upgrades, giving rise to a large number of intelligent application services such as autonomous driving and real-time image and video-assisted navigation [

The emerging Internet of Vehicles (IoV) supports several types of communication, including Vehicle-to-Vehicle (V2V), Vehicle-to-Cloud (V2C), and Vehicle-to-Infrastructure (V2I). V2C utilizes remote cloud resources to provide significant computational support, but has high transmission latency [

With the continuous increase in the scale of vehicles in IoV, using traditional heuristic algorithms or swarm intelligence-related models lacks active learning ability and cannot adapt well to dynamic vehicle environments. If traditional optimization algorithms are used to solve the task offloading and resource allocation problems in IoV, the computational complexity will increase exponentially. Based on the above challenges, the emerging and rapidly developing technology of reinforcement learning provides a new approach to solving these problems. By regarding vehicles or edge nodes as intelligent agents and utilizing the communication cooperation between agents in reinforcement learning to learn the optimal strategy, the dynamic allocation of resources, power selection, and task offloading decisions in IoV can be effectively solved.

This article is based on the aforementioned research background and focuses on the difficulties and challenges faced in the context of the IoV, specifically the problems of task offloading and resource allocation in dynamic IoV scenarios. The article explores the collaboration between V2I and V2V communication modes in resource allocation. By introducing a multi-agent reinforcement learning algorithm in the IoV dynamic model, efficient task offloading and resource allocation can be achieved.

Our main contributions of this paper are summarized as follows:

In this paper, we study the problem of task computation offloading for vehicular networking, where computationally intensive vehicular tasks are offloaded to vehicles with remaining computational resources. In the problem, we take into account the computational capabilities, communication boundaries and communication delays of the vehicles, aiming to optimize the task offloading decision so that the average task completion time is minimized.

In this paper, we propose a deep reinforcement learning-based dynamic offloading and scheduling scheme for Internet of Vehicles tasks to achieve dynamic decision-making for task offloading during non-uniform movement between vehicles. Simulation results show that the algorithm proposed in this paper can intelligently learn the optimal offloading strategy from the interaction with the dynamic network environment.

The remainder of this paper is organized as follows.

In order to ensure the demand for service quality or experience quality of users and effectively improve the network capacity of the vehicular ad hoc network (VANET) system, researchers have proposed various resource scheduling and allocation methods in the VANET communication network. Meanwhile, mobile edge computing (MEC) is also increasingly integrated into the resource scheduling of VANET.

Ning et al. [

In order to enhance network performance, an increasing number of researchers are exploring the integration of V2V and V2I communication, specifically aiming to improve the capacity performance of V2I links while satisfying the constraints of V2V link latency and reliability.

The research team of Wang et al. [

Reference | Optimization objective | Solution |
---|---|---|

[ |
Quality of experience | Two-sided matching and reinforcement learning |

[ |
Energy cost | Maximum value density based heuristic allocation |

[ |
Offloading cost | Reinforcement learning |

[ |
Task execution time | Particle swarm optimization algorithm |

[ |
Remain resources | Deep Q-learning |

[ |
The average completion time of the application | Reinforcement learning |

[ |
Total latency | Distributed algorithm |

[ |
The average energy consumption | Game theory and lyapunov optimization |

[ |
The overall network capacity | Reinforcement learning |

[ |
Task processing delay | Generalized benders decomposition and reinforcement learning |

[ |
Latency and energy | Q-learning |

[ |
Task execution time | Deep deterministic policy gradient |

Based on the above literature review, there are still some issues in resource scheduling in the Internet of Vehicles. Firstly, most of the studies did not consider the dynamic mobility environment in the resource scheduling process. Although some studies have addressed the unloading problem of moving vehicles or considered the impact of vehicle speed changes, most of them are based on two assumptions: vehicles are traveling at a constant speed or are relatively stationary during task unloading. The unloading decisions made in these scenarios do not take into account the dynamic decision-making problems in the complex and real-time changing Internet of Vehicles environment, which is unrealistic. Secondly, most of the current research on using reinforcement learning to solve Internet of Vehicles task unloading problems adopts single-agent reinforcement learning algorithms. However, in actual Internet of Vehicles scenarios, there are a large number of vehicles, and the real-time changes and dynamic environment of vehicles will exponentially increase the dimensions of state and action spaces, which can easily lead to the problem of dimension explosion during actual training and affect the final unloading results.

We construct task offloading model for vehicular networks, which takes into account scenarios where tasks cannot be split, multiple vehicles are involved with a single base station. As illustrated in

Notation | Describe |
---|---|

Service vehicles | |

Task vehicles | |

Wireless transmission rate | |

The wireless bandwidth of the system | |

Task size | |

The CPU cycle of the vehicle | |

The computational intensity of the task | |

The time-varying distance between the task vehicle and the service vehicle | |

The CPU cycle of the service vehicle | |

The CPU cycle of the MEC server in the base station | |

The transmission delay | |

The time delay for task execution on the base station | |

The execution delay on the vehicle |

As the distance that vehicles move during task offloading between vehicles is very limited and the channel state changes only slightly, and the duration of information exchange and data transmission between vehicles and the base station is usually at the millisecond level, the vehicular network topology can be considered as stable and unchanging during the process of resource scheduling and task offloading. In vehicular communication, the orthogonal frequency-division multiple access (OFDMA) technology is adopted, and the task vehicles connected to the same service vehicle or the base station are mutually interference-free. When tasks need to be offloaded to service vehicles or the base station, the task data or the offloaded task computing instructions need to be transmitted through the wireless link. The communication models for these two scenarios are considered as follows.

1) Unload to Service Vehicles: Assuming the wireless channel for uploading task data between the task vehicle and the service vehicle is an additive white Gaussian noise channel, and since the tasks in the model considered in this paper are indivisible, the formula for calculating the data wireless transmission rate

The above formula assumes that d_{0} is not equal to zero, as two vehicles cannot travel at the same position. In addition, the tasks studied in this paper are delay-sensitive, with a delay typically in the order of milliseconds. Vehicles on the road generally travel at high speeds, assuming a speed of 50 ms. The relative speed between the two vehicles is 50 km/h, resulting in a time-varying distance of 0.695 m. However, this value can be ignored because at a relative speed of 50 km/h, the minimum safe distance between vehicles should not be less than 50 m. Therefore, the following formula is obtained:

Assuming the amount of data unloaded to the service vehicle is

During the unloading process, there is also the uploading of calculation instructions related to task unloading. The size of these instructions is usually tens to hundreds of bits, so their delay can be ignored compared with the delay in data transmission. After receiving the calculation instructions, the service vehicle will also undergo a transformation of its own state (such as coordinate transformation), which depends on the data received from the task vehicle. The formula for this process is as follows:

2) Unload to Service Base Station: Assuming that the transmission power of the uplink channel during the unloading process from the task vehicle to the base station is

Therefore, as each time slot of the task unloading process, the variation of the time-varying distance can be regarded as a constant. Thus,

Similarly, with reference to

Once the computing task is generated on the task vehicle and enters decision-making, it will be executed locally or unloaded to the surrounding service vehicles or base station for execution based on the strategy and task size selection.

When a task is executed locally or offloaded to a service vehicle,

When tasks are offloaded to the base station,

Considering the above communication and computation models, the calculation methods for the time delay of the three task offloading modes considered in this paper can be expressed as follows:

As this paper considers the scenario of a two-lane road in the vehicular ad hoc network, vehicles may unload their tasks onto a service vehicle traveling in the opposite direction. The relative speed between the two vehicles can reach more than 200 km/h. For a time slot with a large amount of task data, the delay from unloading to the service vehicle and then back to the result transmission can be significant. Assuming a delay of 0.5 s, the relative distance between the two vehicles increases by more than 28 meters. Assuming the maximum distance for effective communication between vehicles is

In response to the dynamic situation caused by the rapid movement of vehicles, this paper proposes a dynamic constraint condition, which is expressed as follows:

Regarding the unloading of tasks from the vehicle to the base station, it is stipulated that the maximum distance allowed for effective communication between the vehicle and the base station is

Under the dynamic model constraints mentioned above, after the vehicle executes the unloading strategy and obtains the unloading target, it does not immediately perform the unloading action. Instead, it evaluates whether the above constraints are satisfied based on the task volume of the time slot, the location of the vehicle and base station sensed by the environment, and the speed of other vehicles. If the constraints are satisfied, it immediately performs the current unloading action and awaits the result. Otherwise, the current target is removed, and the unloading strategy searches for a suboptimal unloading target until a new target that satisfies the constraints is found. Since this evaluation is performed locally and has minimal computation, the delay can be ignored.

The communication model, computation model, and dynamic constraint model for vehicle-to-base-station joint offloading in vehicular networks have been presented above. This section focuses on establishing the task offloading and resource scheduling problem for vehicular networks based on the described system model. In order to achieve the offloading of computational tasks in a dynamic scenario, to meet the computational resource demands of vehicular network tasks, a multi-constraint problem with the objective of minimizing the average completion time

In

Appropriate decisions are made at the beginning of each period for task offloading of the vehicle based on the current resources in the system, thus maximizing the long term rewards [

(1) State

The global state space is made up of the state spaces of multiple task vehicles, and the state space

(2) Action

Since tasks are not split in the model considered in this section and it is guaranteed that the unloaded tasks can be processed in each time slot, each task vehicle can only unload a single task. The global action space is composed of the action spaces of multiple task vehicles, and each task vehicle agent executes its own actions synchronously in each time slot without sequential order, i.e., the system executes a joint action. The global action space and the action space of the task vehicles are defined as:

(3) Reward

The model is trained in such a way that each task-vehicle agent continuously maximizes its own long-term reward by interacting with the environment, and makes the global reward converge to the optimal value of the system. For high-speed moving vehicles in the IoV network, the tasks are latency-sensitive, and the system goal of minimizing the average processing latency of the tasks is made clear during the Markov model construction process, which is essentially to minimize the task execution latency of the system. Here the latency of vehicle task processing in the system is included in the reward function in each time slot, defining the overall reward of the system as

The framework of the task dynamic offloading and scheduling scheme for vehicular networks based on MADDPG is shown in

Algorithm 1 shows the training process of the TDO-MADDPG algorithm, which includes the recovery of experience and the use of experience for network updates. During the experience collection process, only when the experience recovery pool is full, a set of randomly selected experiences is started for training, and at the end of the sub-training round, each task vehicle intelligence is softly updated for its respective target network. For each training round the average delay of the system tasks is minimized as the optimization objective, and the computational resources of the system are used as the constraints, and it is also assumed that the topology of the whole car network is kept relatively stable during the offloading process of the tasks, which ensures the reliability and stability of the data transmission to a certain extent.

This section establishes a real-time dynamic environment for task offloading and resource allocation in vehicular ad hoc networks and conducts convergence analysis of the TDO-MADDPG algorithm. The effectiveness of the proposed solution is evaluated through an analysis of the average task completion time. The algorithm’s performance is trained and analyzed with respect to parameters, vehicle numbers, and task size. Finally, the algorithm is compared and discussed with DDPG, MADDPG, ATE (All Tasks for Edge), and ATSV (All Tasks for Server Vehicles) algorithms under the same scenario, evaluating convergence and average task completion time to demonstrate the advantages of the optimal strategy.

The simulation in this paper is implemented using Python and Pytorch, running on Ubuntu 18.04.6 operating system with Nvidia GeForce RTX 3080 graphics card. The vehicle movements are based on the CRAWDAD [

Parameter | Value |
---|---|

Base station | 1 |

Vehicles | 9–21 |

Computing intensity M | 128 cycles/bit |

Task vehicle CPU | 9 GHz |

Service vehicle CPU | 10 GHz |

MEC server CPU | 15 GHz |

Task size | [60,200] MB |

Wireless bandwidth | 100 MHz |

Base station height | 20 m |

Noise power | 10–13 W |

Uplink trans. power | 1.5 W |

Path loss factor | 3.4 |

Max distance between vehicles | 200 m |

Max distance between vehicles and base station | 5 km |

Max distance between vehicles and base station | 5 km |

Training was conducted on the algorithm with regards to the size of the experience pool and the size of the sampled data clusters, in order to investigate their impact on the convergence of the algorithm. As shown in

As shown in

To verify the superiority of the proposed algorithm in this paper, this section compares the algorithm with MADDPG, DDPG, unloading all tasks to the base station ATE, and unloading all tasks to the service vehicle ATSV in the same scenario, highlighting the performance advantages of the proposed algorithm. For the MADDPG approach, the dynamic decision-making process during vehicle operation is not considered, and the unloading target is generated based on the algorithm’s own decisions, and the unloading action is directly executed. For the ATE approach, the vehicle unloads all tasks to the base station during dynamic movement, without considering local computing and unloading to the service vehicle. In the ATSV approach, the vehicle’s tasks are not considered for the base station, only local computing and unloading to neighboring service vehicles are considered.

From

As shown in

This article focuses on the vehicular ad hoc network environment for fast-moving vehicles, and combines multi-agent reinforcement learning algorithms with the computation resources of service vehicles and base stations to achieve the optimization goal of minimizing the average task completion time. The dynamic task offloading and resource scheduling problems are studied. In a simulated environment, the proposed algorithm is verified and analyzed through dynamic decision-making, task offloading, and resource scheduling. The feasibility of the algorithm is verified through convergence verification and training analysis of relevant parameters. In addition, the training results show that the algorithm can achieve optimal convergence performance and lower average task processing time for different numbers of vehicles and tasks, demonstrating its stability and good performance. Finally, the proposed algorithm is compared and analyzed with four different methods, and even with an increase in the number of vehicles, it still achieves the lowest average task completion time, indicating its obvious performance advantages. The comprehensive simulation experiment performance verifies that the proposed algorithm has excellent performance advantages. In our future work, we will further consider the security and privacy of vehicular computation offloading and design secure task offloading schemes for Internet of Vehicles.

The authors would like to express our sincere gratitude and appreciation to each other for our combined efforts and contributions throughout the course of this research paper.

The authors received no specific funding for this study.

The authors confirm contribution to the paper as follows: study conception and design: Yu Zhou, Yun Zhang, Guowei Li, Hang Yang, Wei Zhang, Ting Lyu and Yueqiang Xu; data collection and analysis: Yu Zhou, Yun Zhang, Guowei Li, Wei Zhang and Yueqiang Xu; draft manuscript preparation: Yueqiang Xu. All authors reviewed the results and approved the final version of the manuscript.

Not applicable.

The authors declare that they have no conflicts of interest to report regarding the present study.