The distributed flexible job shop scheduling problem (DFJSP) has attracted great attention with the growth of the global manufacturing industry. General DFJSP research only considers machine constraints and ignores worker constraints. As one critical factor of production, effective utilization of worker resources can increase productivity. Meanwhile, energy consumption is a growing concern due to the increasingly serious environmental issues. Therefore, the distributed flexible job shop scheduling problem with dual resource constraints (DFJSP-DRC) for minimizing makespan and total energy consumption is studied in this paper. To solve the problem, we present a multi-objective mathematical model for DFJSP-DRC and propose a Q-learning-based multi-objective grey wolf optimizer (Q-MOGWO). In Q-MOGWO, high-quality initial solutions are generated by a hybrid initialization strategy, and an improved active decoding strategy is designed to obtain the scheduling schemes. To further enhance the local search capability and expand the solution space, two wolf predation strategies and three critical factory neighborhood structures based on Q-learning are proposed. These strategies and structures enable Q-MOGWO to explore the solution space more efficiently and thus find better Pareto solutions. The effectiveness of Q-MOGWO in addressing DFJSP-DRC is verified through comparison with four algorithms using 45 instances. The results reveal that Q-MOGWO outperforms comparison algorithms in terms of solution quality.

With the continuous development of the manufacturing industry, the flexible job shop scheduling problem (FJSP) has become one of the core problems in production scheduling. The increasing diversification of market demands and the shortening of product life cycles have prompted manufacturing enterprises to gradually shift from the traditional single-factory production pattern to the multi-factory collaborative production pattern. Under this pattern, optimizing the processing sequence of multiple jobs on multiple machines has become an important challenge for the manufacturing industry. To meet this challenge, the manufacturing industry has begun seeking more efficient and flexible scheduling schemes. The distributed flexible job shop scheduling problem (DFJSP) has thus become the focus of manufacturing industry and academia. Although many scholars have studied DFJSP, most of the previous research focused on machine constraints within the factories, with relatively little attention paid to worker constraints.

In actual production, the collaborative work of workers and machines is crucial to improving production efficiency. By considering the skills of workers and the characteristics of machines, resources can be allocated more rationally to avoid waste of resources [

Therefore, the DFJSP with dual resource constraints (DFJSP-DRC) is investigated in this paper, with the objectives of minimizing makespan and total energy consumption. By considering the skills of workers and the characteristics of machines comprehensively, we construct a mathematical model of DFJSP-DRC, which enriches the existing DFJSP model and provides references for subsequent DFJSP research. Through considering the synergistic effect of worker and machine resources in actual production, this model not only enhances the practicality and applicability of scheduling theory but also provides new perspectives and tools to solve complex production scheduling problems.

It is difficult to get exact solutions using traditional mathematical methods. To obtain high-quality solutions, a novel and effective Q-learning-based multi-objective grey wolf optimizer (Q-MOGWO) is designed, which adds a local search strategy to multi-objective grey wolf optimizer (MOGWO) and uses a Q-learning strategy to dynamically adjust the local search strategy according to the population state. The main contributions of this paper are as follows:

(1) DFJSP-DRC is studied, and a multi-objective mathematical model aiming at minimizing the makespan and total energy consumption is established.

(2) A hybrid population initialization strategy is introduced to enhance the quality and diversity of the initial population, and an improved active decoding strategy that fully utilizes the public idle time of machines and workers is designed to transform solutions into efficient scheduling schemes.

(3) Two improved wolf predation strategies and a local search strategy based on Q-learning are proposed to extend the search space of solutions.

The rest of this paper is organized as follows:

Some scholars have studied DFJSP with the objective of minimizing makespan [

To better simulate the real production scenario, both machines and workers should be considered in FJSP, which is referred to as the dual resource constrains flexible job shop scheduling problem (DRCFJSP). To solve the problem, Gong et al. [

The intelligent optimization algorithm is an effective method to solve different types of FJSP. Lin et al. [

Integrating reinforcement learning (RL) with intelligent optimization algorithms can effectively guide the intelligent optimization algorithms' search process, improve solution quality and accelerate convergence [

The above literature proves that the combination of RL and intelligent optimization algorithms can help intelligent algorithms to find better solutions and accelerate convergence speed when solving various job shop scheduling problems. As a new intelligent optimization algorithm, MOGWO has been used to solve a variety of scheduling problems [

There are _{i} contains _{i} operations. Every factory has

To explain DFJSP-DRC, _{1}, processing time is 3, and ‘-’ indicates that this worker cannot process this operation.

Jobs | Operations | _{1} |
_{2} |
_{3} |
---|---|---|---|---|

_{1} |
_{11} |
1/3, 3/4 | – | 1/2, 3/4 |

_{12} |
3/7, 4/8 | 2/5, 4/2 | 3/2 | |

_{2} |
_{21} |
1/2, 4/3 | 4/5 | 1/6 |

_{22} |
3/2, 4/3 | 4/6 | 3/4 | |

_{23} |
– | 2/5 | 2/4 | |

_{3} |
_{31} |
1/2, 3/9, 4/5 | 4/5 | 1/2, 3/7 |

_{32} |
1/4, 3/4 | – | 1/5, 3/7 | |

_{33} |
4/2 | 4/7 | – | |

_{4} |
_{41} |
1/7, 3/8, 4/9 | 2/6, 4/5 | 1/5, 2/7, 3/9 |

_{42} |
1/4 | 2/4 | 1/6, 2/5 | |

_{43} |
4/5 | 4/9 | – |

Based on the problem definition and assumptions in

_{ij}: The set of available machines to operate O_{ij}

_{ijk}: The set of available workers who can operate machine _{ij}

_{j}: The set of available factories to operate _{i}

_{i}: The

_{k}: The

_{s}: The

_{i}: Total number of operations for _{i}

_{ij}: The _{i}

_{ijks}: The processing time of _{ij} processed by worker

_{ij}: Start processing time of _{ij}

_{ij}: End processing time of _{ij}

_{ijksf}: The processing time of _{ij} processed by worker

_{ijf}: Starting processing time of _{ij} in factory

_{ijf}: End processing time of _{ij} in factory

_{ijks}: Completion time of _{ij} on machine

_{i}: Completion time of _{i}

_{max}: Makespan

_{k}: Process energy consumption per unit time for machine

_{k}: Idle energy consumption per unit time for machine

_{if}: 0-1 decision variables, take value 1 when _{i} is processed in factory

_{ijksf}: 0-1 decision variables, take value 1 when _{ij} is processed in factory

_{i'j'ijkf}: 0-1 decision variables, take value 1 when _{ij} and _{i'j'} are processed by machine _{i'j'} is processed immediately before _{ij}; otherwise, the value is 0.

_{i'j'ijsf}: 0-1 decision variables, take a value of 1 to indicate that _{ij} and _{i'j'} are processed by worker _{i'j'} is processed immediately before _{ij}; otherwise, the value is 0.

Combining the symbol description and problem definition, the mathematical model of DFJSP-DRC is developed:
_{i}.

In the grey wolf optimizer (GWO), the grey wolf population is composed of four species based on social leadership mechanisms:

MOGWO is built on GWO by adding two components. The first component is an archive that stores non-dominated Pareto optimal solutions acquired thus far. The second component is a leader selection strategy employed to aid in selecting

The pseudo-code of Q-MOGWO is described in Algorithm 1. The main steps of Q-MOGWO include four-layer encoding, active decoding based on public idle time, hybrid initialization strategy, wolf pack search strategy and neighborhood structure based on Q-learning. The iteration of Q-MOGWO is as follows. Firstly, the initial population is generated by a hybrid initialization strategy and the head wolves are selected in the population. Secondly, the head wolves lead the evolution of the population, and the evolved population and the initial population are merged by the elite strategy to obtain the external archive and the new generation of population. Finally, the local search strategy based on Q-learning is applied to the external archive, and the external archive is updated by the elite strategy.

The feasible solutions for DFJSP-DRC are represented using a four-layer coding scheme, which includes vectors for the operations sequence (OS), factories sequence (FS), machines sequence (MS) and workers sequence (WS). A four-layer encoding scheme of 4 jobs and 3 factories is shown in _{1} and _{4} have two operations, _{2} and _{3} have three operations. Each factory has three machines and two workers. OS consists of integers from 1 to _{i}. The 2 in the fifth position indicates operation _{22}. The second layer is FS, in which each number represents the processing factory for each operation. The sequence length of FS is the same as the length of OS, and it is clear that _{2} and _{4} are processed in _{1}, _{1} is processed in _{2}, _{3} is processed in _{3}. The MS and WS structure is similar to the FS, with the third number in the MS and WS indicating that _{41} is processed by machine 1 and worker 1.

A good decoding strategy not only can rationalize the arrangement of jobs, machines and workers, but also obtain a high-quality scheduling scheme. Based on the literature of Kacem et al. [

Case 1: As shown in _{s}, _{e}], _{e}–_{s}_{ijks} and _{i(j-1)}≥_{s}. In this case, the processing time slot of the operation is denoted as [_{i(j-1)}, _{i(j-1)}+_{ijks}].

Case 2: As shown in _{s}, _{e}], _{e}–_{s}_{ijks} and _{s}≥_{i(j-1)}. In this case, the processing time slot of the operation is denoted as [_{s}, _{s}+_{ijks}].

(1) Initialization strategy for the OS. To ensure the diversity and randomness of the population, the initialization coding for OS adopts the positional ascending rule. Firstly, generate a list of basic OS. Secondly, generate a random number ranging from 0 to 1 for each element in the OS. Finally, the OS is rearranged in ascending order of these random numbers to obtain the initial OS.

(2) Initialization strategy for the FS. The following two strategies each account for 50% of the population size. The first strategy prioritizes the assignment to the factories with few jobs. One factory is randomly chosen if there are multiple optional factories for selection. The second strategy involves the random assignment of jobs to a factory.

(3) Initialization strategy for MS and WS. The processing time for the DFJSP-DRC depends on both the machine and the worker. Taking into account the time differences that arise from different operators using the same machine, a principle of machine-worker integration is formulated. First, determine the set of processing machines to operate, and then determine the available processing workers for each machine. For example, the available processing machines for _{21} of _{2} consist of _{1} and _{2}. _{1} and _{2} can be operated by worker _{1}, and _{2} can be operated by worker _{2}. The set of available machines and workers for _{21} of _{2} are [(_{1}, _{1}), (_{2}, _{1}) (_{2}, _{2})]. Each operation has three strategies: randomly selecting a machine and worker, selecting the machine and worker combinations with the shortest processing time, and selecting the machine and worker combinations with the least energy consumption.

MOGWO mimics the grey wolf population predation strategy, utilizing the three head wolves to guide the position update of the population. However, this strategy cannot be applied directly to DFJSP-DRC. Therefore, in Q-MOGWO, the two modified search operators are adopted for global search to ensure the feasibility of the DFJSP-DRC solution. The social leadership mechanism proposed by Lu et al. [

The first search operator comprises improved precedence operation crossover (IPOX) [_{1} and _{2} represent the two paternal chromosomes that undergo crossover to generate offspring _{1} and _{2}, as shown in

The second search operator is the improved IPOX, which executes the crossover for OS, MS and WS, as shown in

The main components of RL include agent, environment, actions, rewards and states. The agent in the RL algorithm gets as much reward as possible through trial and error of the environment. The agent takes action by its state _{t} at time _{t+1} and transitioning to state _{t+1}.

Q-learning is an effective algorithm that improves the solution diversity of the algorithm by choosing appropriate local search operators during iteration. Q-learning is a greedy algorithm where the agent selects the action with the highest Q value to maximize rewards. The agent can fine-tune the disparity between the actual and estimated Q values by computing the difference between them. Learning rate (_{t} can influence the later state _{t+1,} and _{t} is the reward after performing an action _{t}. The Q value is updated according to the

In Q-MOGWO, the PF is the set of optimal solutions, which can reflect the comprehensive ability of the algorithm. Q-learning guides the algorithm to choose the optimal local search strategy. Therefore, the solution set in the external archive acts as an agent to reflect the success of the local search strategy.

The state change can give feedback to the agent and determine whether the action performed can improve the overall quantity of the PF. In Q-MOGWO, MOGWO is viewed as the environment. To better construct the state of the environment, the comprehensive performance of the PF, and the other is the degree of excellence degree of the

Whether or not the

where _{i} indicates the IGD value of the

There are three outcomes of ∆_{i} does not dominate _{i-1}; (2) State 2: _{i} dominates _{i-1}; (3) State 3: _{i} dominates _{i-1}; (4) State 4: _{i} does not dominate _{i-1}; (5) State 5: _{i} dominates _{i-1}; (6) State 6: _{i} does not dominate _{i-1}.

Upon performing an action, the agent receives a reward, which may be positive or negative. The definition of reward is as

Local search strategy is a crucial technique to improve resource utilization, but it consumes a lot of computing resources. Executing the local search strategy randomly leads to a low success rate. However, RL offers selection strategies to guide agents in choosing the local search strategy with the highest likelihood of success.

Based on the literature of Zhang et al. [

Combining two different local search operators, three local search strategies are proposed. Local search strategy 1: Select the factory with the makspan. Local search strategy 2: Select the factory with the maximum energy consumption. Local search strategy 3: Randomly selected factory. According to the above description, an adaptive local search strategy based on Q-learning(Q-ALS) is designed, and Algorithm 3 provides the corresponding pseudo-code.

A series of experimental instances are designed to assess Q-MOGWO’s performance. The Q-MOGWO and comparison algorithms are coded in Python on an Intel Core i7 8550 CPU @1.80 GHz and 8G RAM. To be fair, each algorithm collects the results after 20 independent runs and then calculates the average for performance comparison.

MOEA/D [

(1) Spread measures the degree of propagation between the found solutions, and its formula is:

in _{i} represents the Euclidean distance between each point in the real PF and its nearest neighbor within the front. _{i}, the

(2) HV serves as a metric for assessing the overall performance of an algorithm. It quantifies the volume or area within the objective space enclosed by the resulting non-dominant solution set and reference points. The formula of HV is:

in

For there is no specific instance of the DFJSP-DRC, the flexible job shop scheduling problem benchmark [_{ijks} is randomly generated within [_{ij}, _{ij} +_{ij}], where the operation processing time _{ij} is given by the benchmarking algorithm and _{ij} ∈ [2,8] [

The parameter configuration affects the algorithm’s performance in solving the problem. Q-MOGWO contains three primary parameters: the length of the external archive (denoted by

Level | |||
---|---|---|---|

1 | 100 | 30 | 100 |

2 | 200 | 50 | 200 |

3 | 300 | 80 | 300 |

Q-MOGWO runs 20 times under each parameter combination to ensure fairness, and the average IGD values from these 10 runs are collected. Experiments are conducted on the Mk-3-01 instance, employing IGD to evaluate parameter combinations, as presented in

Number | Parameters | IGD | ||
---|---|---|---|---|

1 | 1 | 1 | 1 | 72.918 |

2 | 1 | 2 | 2 | 53.365 |

3 | 1 | 3 | 3 | 42.634 |

4 | 2 | 1 | 2 | 33.547 |

5 | 2 | 2 | 3 | 31.562 |

6 | 2 | 3 | 1 | 45.074 |

7 | 3 | 1 | 3 | 24.532 |

8 | 3 | 2 | 1 | 50.612 |

9 | 3 | 3 | 2 | 25.169 |

Level 1 | 56.306 | 43.666 | 56.201 | |

Level 2 | 36.728 | 45.180 | 37.360 | |

Level 3 | 33.438 | 37.626 | 32.909 | |

Range | 22.868 | 7.554 | 23.292 | |

Rank | 2 | 3 | 1 |

The proposed strategy’s effectiveness is validated through experiments on 15 instances. Algorithms Q-MOGWO1, Q-MOGWO2 and Q-MOGWO3 denote the local search strategy for the makespan factory, the maximum energy consumption factory and the randomized factory, respectively. The IGD and Spread values of Q-MOGWO, Q-MOGWO1, Q-MOGWO2 and Q-MOGWO3 are shown in

IGD | Spread | |||||||
---|---|---|---|---|---|---|---|---|

Q-MOGWO | Q-MOGWO1 | Q-MOGWO2 | Q-MOGWO3 | Q-MOGWO | Q-MOGWO1 | Q-MOGWO2 | Q-MOGWO3 | |

MK-2-01 | 55.234 | 44.184 | 47.315 | 1.409 | 0.900 | 1.226 | ||

MK-2-04 | 97.492 | 87.787 | 101.213 | 1.157 | 0.984 | 1.135 | ||

MK-2-09 | 594.551 | 331.678 | 395.348 | 1.800 | 0.964 | 1.166 | ||

MK-2-12 | 480.252 | 568.373 | 628.657 | 1.236 | 1.482 | 1.015 | ||

MK-2-15 | 565.720 | 614.396 | 641.083 | 1.046 | 1.077 | 1.095 | ||

MK-3-01 | 66.432 | 70.455 | 25.947 | 1.155 | 1.023 | 0.935 | ||

MK-3-04 | 111.541 | 72.101 | 58.187 | 1.105 | 1.008 | 0.987 | ||

MK-3-09 | 482.553 | 315.329 | 189.537 | 1.415 | 1.103 | 1.115 | ||

MK-3-12 | 459.829 | 439.840 | 401.252 | 1.211 | 0.906 | 1.035 | ||

MK-3-15 | 372.663 | 758.286 | 809.903 | 1.013 | 1.007 | 0.975 | ||

MK-4-01 | 55.706 | 55.242 | 44.981 | 0.905 | 0.919 | 0.961 | ||

MK-4-04 | 67.482 | 106.403 | 69.193 | 1.153 | 1.043 | 1.093 | ||

MK-4-09 | 219.599 | 205.870 | 289.636 | 0.898 | 0.998 | 0.976 | ||

MK-4-12 | 409.506 | 398.500 | 443.178 | 1.141 | 1.089 | 1.064 | ||

MK-4-15 | 291.637 | 489.670 | 709.550 | 0.995 | 0.916 | 0.955 |

To further evaluate the effectiveness of Q-MOGWO, four multi-objective optimization algorithms, MOEA/D, MA, NSGA-II and MOGWO, are selected as compared algorithms. Regarding the parameter setting of the compared algorithms, refer to the literature [

Algorithm | Parameter setting |
---|---|

MOEA/D | Population_num = 300, generation_num = 300, pc_max = 0.8, pm_max = 0.1, pc_min = 0.4, pm_min = 0.02, T = 10, H = 300 |

MA | Population_num = 300, generation_num = 300, pc_max = 0.8, pm_max = 0.1, pc_min = 0.4, pm_min = 0.02 |

NSGA-II | Population_num = 300, generation_num = 300, pc_max = 0.8, pm_max = 0.1, pc_min = 0.4, pm_min = 0.02, external archive length = 80 |

MOGWO | Population_num = 300, generation_num = 300, external archive length = 80 |

Q-MOGWO | MA | NSGA-II | MOEA/D | MOGWO | |
---|---|---|---|---|---|

MK-2-01 | 79.778 | 85.983 | 66.391 | 138.772 | |

MK-2-02 | 35.413 | 72.422 | 112.391 | 82.105 | |

MK-2-03 | 361.266 | 328.452 | 1352.319 | 656.802 | |

MK-2-04 | 272.001 | 140.738 | 170.961 | 118.303 | |

MK-2-05 | 245.971 | 167.056 | 424.825 | 157.336 | |

MK-2-06 | 101.209 | 284.131 | 786.891 | 342.987 | |

MK-2-07 | 265.343 | 125.819 | 572.337 | 221.306 | |

MK-2-08 | 338.599 | 1392.342 | 3738.575 | 1667.969 | |

MK-2-09 | 263.779 | 1925.326 | 4696.575 | 1998.552 | |

MK-2-10 | 250.418 | 1167.949 | 2786.030 | 1114.156 | |

MK-2-11 | 480.000 | 878.320 | 1948.067 | 918.455 | |

MK-2-12 | 696.596 | 1247.932 | 4750.929 | 1931.157 | |

MK-2-13 | 361.226 | 2397.251 | 7700.891 | 2636.949 | |

MK-2-14 | 559.035 | 3805.628 | 10950.066 | 10860.940 | |

MK-2-15 | 1311.067 | 3161.046 | 10340.673 | 5665.504 | |

MK-3-01 | 70.151 | 82.388 | 62.633 | 72.585 | |

MK-3-02 | 39.680 | 106.610 | 77.532 | 124.279 | |

MK-3-03 | 233.989 | 345.571 | 193.757 | 674.889 | |

MK-3-04 | 268.067 | 169.470 | 88.779 | 167.414 | |

MK-3-05 | 223.988 | 230.011 | 183.460 | 254.496 | |

MK-3-06 | 101.561 | 308.865 | 141.637 | 441.956 | |

MK-3-07 | 274.331 | 372.838 | 80.358 | 517.304 | |

MK-3-08 | 311.758 | 1341.182 | 611.138 | 1960.633 | |

MK-3-09 | 333.318 | 1357.865 | 1185.981 | 1883.171 | |

MK-3-10 | 647.407 | 1280.364 | 1227.488 | 1905.381 | |

MK-3-11 | 410.461 | 703.069 | 491.003 | 720.391 | |

MK-3-12 | 553.592 | 1719.839 | 727.915 | 2128.851 | |

MK-3-13 | 392.819 | 1533.313 | 1133.513 | 2903.875 | |

MK-3-14 | 1037.839 | 2777.246 | 1459.384 | 3462.166 | |

MK-3-15 | 513.426 | 3586.269 | 1919.109 | 4692.594 | |

MK-4-01 | 66.681 | 72.205 | 113.673 | 102.103 | |

MK-4-02 | 40.364 | 117.182 | 97.689 | 148.117 | |

MK-4-03 | 203.737 | 554.247 | 1278.462 | 619.975 | |

MK-4-04 | 165.024 | 86.888 | 273.664 | 199.358 | |

MK-4-05 | 195.211 | 240.223 | 465.922 | 313.644 | |

MK-4-06 | 97.620 | 434.910 | 864.321 | 513.480 | |

MK-4-07 | 162.190 | 246.298 | 514.142 | 390.048 | |

MK-4-08 | 403.572 | 1165.692 | 2597.720 | 1388.311 | |

MK-4-09 | 406.639 | 1777.226 | 4007.228 | 2050.637 | |

MK-4-10 | 503.998 | 836.156 | 2440.588 | 957.855 | |

MK-4-11 | 414.475 | 604.326 | 1742.325 | 739.629 | |

MK-4-12 | 530.439 | 1658.010 | 3358.178 | 2171.534 | |

MK-4-13 | 468.675 | 1948.452 | 7030.645 | 2519.904 | |

MK-4-14 | 913.890 | 3554.755 | 8028.150 | 4735.741 | |

MK-4-15 | 682.696 | 4287.531 | 9764.774 | 4645.277 |

Q-MOGWO | MA | MOEA/D | NSGA-II | MOGWO | |
---|---|---|---|---|---|

MK-2-01 | 0.299 | 0.325 | 0.360 | 0.402 | |

MK-2-02 | 0.195 | 0.177 | 0.232 | 0.224 | |

MK-2-03 | 0.121 | 0.102 | 0.196 | 0.162 | |

MK-2-04 | 0.210 | 0.318 | 0.358 | 0.330 | |

MK-2-05 | 0.221 | 0.181 | 0.280 | 0.292 | |

MK-2-06 | 0.086 | 0.063 | 0.116 | 0.107 | |

MK-2-07 | 0.187 | 0.165 | 0.288 | 0.263 | |

MK-2-08 | 0.235 | 0.085 | 0.228 | 0.216 | |

MK-2-09 | 0.210 | 0.044 | 0.181 | 0.172 | |

MK-2-10 | 0.231 | 0.066 | 0.209 | 0.209 | |

MK-2-11 | 0.297 | 0.099 | 0.277 | 0.269 | |

MK-2-12 | 0.215 | 0.066 | 0.232 | 0.209 | |

MK-2-13 | 0.243 | 0.062 | 0.218 | 0.211 | |

MK-2-14 | 0.190 | 0.050 | 0.198 | 0.167 | |

MK-2-15 | 0.231 | 0.062 | 0.203 | 0.193 | |

MK-3-01 | 0.299 | 0.325 | 0.360 | 0.371 | |

MK-3-02 | 0.438 | 0.391 | 0.413 | 0.394 | |

MK-3-03 | 0.171 | 0.212 | 0.244 | 0.198 | |

MK-3-04 | 0.210 | 0.318 | 0.350 | 0.330 | |

MK-3-05 | 0.235 | 0.249 | 0.321 | 0.319 | |

MK-3-06 | 0.120 | 0.151 | 0.144 | 0.135 | |

MK-3-07 | 0.275 | 0.372 | 0.368 | 0.325 | |

MK-3-08 | 0.175 | 0.152 | 0.181 | 0.172 | |

MK-3-09 | 0.232 | 0.179 | 0.235 | 0.200 | |

MK-3-10 | 0.367 | 0.301 | 0.312 | 0.373 | |

MK-3-11 | 0.178 | 0.173 | 0.195 | 0.196 | |

MK-3-12 | 0.150 | 0.146 | 0.179 | 0.171 | |

MK-3-13 | 0.179 | 0.156 | 0.218 | 0.165 | |

MK-3-14 | 0.112 | 0.116 | 0.143 | 0.142 | |

MK-3-15 | 0.147 | 0.122 | 0.144 | 0.125 | |

MK-4-01 | 0.345 | 0.346 | 0.416 | 0.385 | |

MK-4-02 | 0.330 | 0.264 | 0.314 | 0.294 | |

MK-4-03 | 0.194 | 0.206 | 0.287 | 0.271 | |

MK-4-04 | 0.228 | 0.251 | 0.354 | 0.301 | |

MK-4-05 | 0.348 | 0.290 | 0.400 | 0.369 | |

MK-4-06 | 0.093 | 0.088 | 0.126 | 0.119 | |

MK-4-07 | 0.250 | 0.225 | 0.343 | 0.304 | |

MK-4-08 | 0.297 | 0.134 | 0.296 | 0.297 | |

MK-4-09 | 0.316 | 0.165 | 0.293 | 0.282 | |

MK-4-10 | 0.343 | 0.135 | 0.326 | 0.319 | |

MK-4-11 | 0.386 | 0.168 | 0.393 | 0.385 | |

MK-4-12 | 0.283 | 0.136 | 0.303 | 0.284 | |

MK-4-13 | 0.309 | 0.147 | 0.319 | 0.299 | |

MK-4-14 | 0.248 | 0.097 | 0.261 | 0.250 | |

MK-4-15 | 0.291 | 0.116 | 0.275 | 0.266 |

To visualize the performance of Q-MOGWO, NSGA-II, MOEA/D, MA and MOGWO, 6 instances (Mk-3-01, Mk-3-08, Mk-3-15, Mk-4-01, Mk-4-08, Mk-4-15) with different scales are selected, and the Pareto front obtained from one run of each algorithm for each selected instance is shown in

In addition, to further prove the effectiveness of Q-MOGWO, the IGD and HV values in

Metrics | Algorithms | Rank | N | Mean | Std. | Min | Max |
---|---|---|---|---|---|---|---|

IGD | Q-MOGWO | 45.000 | |||||

MA | 2.380 | 45.000 | 361.984 | 267.675 | 35.413 | 1311.067 | |

MOEA/D | 3.360 | 45.000 | 1126.209 | 1154.538 | 72.205 | 4287.531 | |

NSGA-II | 4.090 | 45.000 | 2279.091 | 2985.190 | 62.633 | 10950.070 | |

MOGWO | 4.180 | 45.000 | 1598.153 | 2014.227 | 72.585 | 10860.940 | |

0.000 | |||||||

HV | Algorithms | Rank | N | Mean | Std. | Min | Max |

Q-MOGWO | 45.000 | 0.087 | |||||

MA | 2.480 | 45.000 | 0.238 | 0.080 | 0.086 | 0.438 | |

MOEA/D | 1.380 | 45.000 | 0.178 | 0.044 | 0.391 | ||

NSGA-II | 3.500 | 45.000 | 0.269 | 0.082 | 0.116 | 0.416 | |

MOGWO | 2.640 | 45.000 | 0.255 | 0.083 | 0.107 | 0.402 | |

0.000 |

The experimental results show that Q-MOGWO outperforms the compared algorithms. The main reasons are as follows: (1) The hybrid population initialization strategy generates high-quality initial population and enhances global exploration of Q-MOGWO; (2) The active decoding strategy that effectively uses the public idle time of machines and workers decodes solutions to high-quality scheduling schemes; (3) According to the characteristics of the problem, two kinds of wolf predation strategies are designed to effectively explore the search space of solutions and increase the population diversity; (4) The Q-learning-based local search strategy enhances the local search capability and efficiency of Q-MOGWO, leading to accelerated convergence.

In this paper, Q-MOGWO is proposed to solve the DFJSP-DRC with the objectives of minimizing makespan and total energy consumption. In Q-MOGWO, three scheduling rules are used to generate high-quality initial solutions, and an active decoding strategy converts solutions into reasonable scheduling schemes. Two predation strategies are designed to explore the unknown regions of solution space in the wolf predation phase. To improve the local search capability of Q-MOGWO, two kinds of neighborhood structures based on critical factories are designed. Through the effectiveness analysis, it can be found that the factory selection based on Q-learning significantly enhances the performance of Q-MOGWO. Especially when solving large-scale problems, Q-MOGWO is superior to the compared algorithms and has better non-dominated solutions.

The problem studied in this paper does not consider the impact of dynamic events on the scheduling schemes. Although worker resource is introduced, worker fatigue is not considered. Therefore, in future work, dynamic events such as machine failure and emergency order insertion will be considered, and worker fatigue will be introduced into the optimization objectives. In addition, some learning mechanisms will be introduced into the framework of Q-MOGWO to obtain stronger adaptability.

Thanks to three anonymous reviewers and the editors of this journal for providing valuable suggestions for the paper.

This work was supported by the Natural Science Foundation of Anhui Province (Grant Number 2208085MG181); the Science Research Project of Higher Education Institutions in Anhui Province, Philosophy and Social Sciences (Grant Number 2023AH051063); and the Open Fund of Key Laboratory of Anhui Higher Education Institutes (Grant Number CS2021-ZD01).

The authors confirm contribution to the paper as follows: study conception and design: Hongliang Zhang, Yi Chen and Gongjie Xu; data collection: Yuteng Zhang; analysis and interpretation of results: Hongliang Zhang, Yi Chen and Gongjie Xu; draft manuscript preparation: Hongliang Zhang, Yi Chen. All authors reviewed the results and approved the final version of the manuscript.

The data generated in this paper are available from the corresponding author on reasonable request.

The authors declare that they have no conflicts of interest to report regarding the present study.