Social engineering attacks are considered one of the most hazardous cyberattacks in cybersecurity, as human vulnerabilities are often the weakest link in the entire network. Such vulnerabilities are becoming increasingly susceptible to network security risks. Addressing the social engineering attack defense problem has been the focus of many studies. However, two main challenges hinder its successful resolution. Firstly, the vulnerabilities in social engineering attacks are unique due to multistage attacks, leading to incorrect social engineering defense strategies. Secondly, social engineering attacks are real-time, and the defense strategy algorithms based on gaming or reinforcement learning are too complex to make rapid decisions. This paper proposes a multiattribute quantitative incentive method based on human vulnerability and an improved Q-learning (IQL) reinforcement learning method on human vulnerability attributes. The proposed algorithm aims to address the two main challenges in social engineering attack defense by using a multiattribute incentive method based on human vulnerability to determine the optimal defense strategy. Furthermore, the IQL reinforcement learning method facilitates rapid decision-making during real-time attacks. The experimental results demonstrate that the proposed algorithm outperforms the traditional Q-learning (QL) and deep Q-network (DQN) approaches in terms of time efficiency, taking 9.1% and 19.4% less time, respectively. Moreover, the proposed algorithm effectively addresses the non-uniformity of vulnerabilities in social engineering attacks and provides a reliable defense strategy based on human vulnerability attributes. This study contributes to advancing social engineering attack defense by introducing an effective and efficient method for addressing the vulnerabilities of human factors in the cybersecurity domain.

With the continuous development of network technology, communication is not limited by traditional distance or the various social networks, e-mail, or network communication methods that satisfy daily needs for communication and entertainment. The internet is becoming increasingly important, and we cannot live without it. However, there are also nefarious actors lurking in the network; they attack by taking advantage of users’ psychological weaknesses and inducing them to disclose sensitive information [

In the second quarter of 2022, the APWG (Anti-Phishing Working Group) observed 1,097,811 phishing attacks, a new record, and this was the worst quarter for phishing ever followed by the APWG. The number of phishing attacks reported to the APWG has quadrupled since the beginning of 2020, when the APWG started to keep phishing attacks. A total of 68,000 to 94,000 episodes per month were followed by the APWG in early 2020 [

Unlike traditional cyber attacks, social engineering attacks mainly exploit the psychological weaknesses of the target to execute the attack, and Reference [

Network security has recently received increased attention, especially for social engineering attack defense strategies research. Many studies have focused on reinforcement learning, some based on game theory [

A summary of the related work on reinforcement learning applied to cybersecurity, game-learning programs, and secure game-theoretic modeling are shown in

Category | Reference | Algorithms | Main contributions |
---|---|---|---|

Reinforcement learning in network security | Zhong et al. [ |
DNN, SVM,RL | An RL-based system is proposed to protect users from malicious traffic. Generate agents through network attack and defense based on the deep neural network environment, surpass the traditional ML algorithm, and can detect adversarial samples. |

Elderman |
MMQL, NQL | A method for modeling the decision-making process of network security monitoring using a game-theoretic approach. | |

Chung |
MDP, optimal attacker policy | A solution to attack graph transformation is proposed. Transform attack graphs into MDPs and use policy search to address defense policy generation. | |

Durkota |
DRL, RL | Integrate traditional reinforcement learning into deep learning, and use deep reinforcement learning to build an autonomous network defense system to control and protect network security. | |

Game-learning programs | Durkota |
MDP, RF | Consequences of using the Markov game framework instead of MDPs in reinforcement learning. Solve the optimal strategy of a two-person zero-sum game. |

Ridley [ |
nash Q-Learning | Based on the framework of random games, Q-learning is extended to multi-agent systems. Nash Q-learning (NashQ) is proposed, which uses multi-agent Q to learn the best defense strategy under the random game framework. | |

Littman [ |
Neural fictitious self-play, NFSP | Introduced Neural Fictitious Self-Play (NFSP), the first end-to-end deep reinforcement learning method for learning an approximate Nash equilibrium for imperfect information games from self-play NFSP requires no prior domain knowledge can be expanded. | |

Game theoretic modeling in cyber security | Hu et al. [ |
FLIPIT | Defines the FLIPIT game and the application of FLIPIT in various computer security scenarios (including APT). |

Heinrich |
MPC, DRL, RL, game theory | Existing security games in computer networks are reviewed and compared in terms of players, games, etc., with the overall goal of identifying and addressing security and privacy issues, where game theory can be applied to model and evaluate security issues and be used to design effective protocols. |

In this paper, a new social engineering defense model is designed by combining the essential attributes of users to provide an optimal defense strategy with a low-computational-complexity social engineering defense. In addition, this paper presents a mechanism for quantifying user characteristics to model the vulnerability of users for the first time quantitatively, and a stochastic game is used to simulate the interaction between attackers and defenders. Finally, this paper applies Q-learning to stochastic games, constructs a reinforcement learning model for multiple intelligences, proposes a Q-learning algorithm based on user attributes, and optimizes the algorithm. Multiple attackers are treated as independent intelligence that can learn actively and independently to collect more information for the system proposes a proposed Q-learning (IQL) algorithm to reduce the algorithm’s complexity algorithm and improve its efficiency. The main research contributions of this paper are as follows.

This paper proposes a mechanism for quantifying user vulnerability based on target attributes that consider the interaction between user vulnerability and attackers and design a more comprehensive social engineering model approach to improve social engineering security.

This paper considers attackers and defenders as two sides of a game and designs a multi-intelligence reinforcement learning model using stochastic game theory combined with Q-learning. For the first time, this paper proposes a multiobjective attribute structure learning algorithm that can provide optimal decision strategies.

This paper proposes an optimization algorithm IQL. This paper can quickly obtain an optimal defense strategy by combining target attributes and user vulnerability information strategy. It is experimentally demonstrated that the algorithm performs better than QL and DQN.

This paper is composed of five sections. Following this introduction is Chapter 2, Problem Definition. In Chapter 3, Presenting the Model, an improved QL algorithm is proposed. This is followed by Chapter 4, Experimental Results and Analysis. Chapter 5 concludes the paper.

Usually, when attackers engineer user attacks, this paper considers the attack method, attack technique, attack detection, etc., however, all of these factors must be identified through human judgment. Therefore, the threat comes from combining these attacks and interaction with people during attack reinforcement. Existing social engineering attack defense models ignore the role of human attributes, so this paper proposes a new quantitative approach that combines human characteristics to quantitatively evaluate each attack node using the standard notation in

Notation | Definition |
---|---|

A constant | |

State space of the game model | |

Players of the game model | |

Attack actions in the game model | |

Defensive actions in the game model | |

Attacker’s reward in the game model | |

Defender’s reward in the game model | |

The optimal attack strategy | |

The optimal defensive strategy | |

The probability of a successful attack | |

The utility function of the game model | |

The immediate reward | |

The expected utility in the next state | |

The Q function of the Q-learning algorithm | |

Learning rate | |

Discount factor |

The attributes of the nodes are divided into two types: physical attributes and target attributes. A physical analysis mainly considers the impact size of the nodes in the entire system. Each node’s physical characteristics include the importance level of the node and the connection level in the node. The node’s target attributes mainly have the features of the target, security knowledge, character, and security awareness attributes. These attributes are directly related to the strength of the security defense.

The importance level (IL) mainly indicates the node’s importance in the entire social engineering system. This importance level primarily three factors: the valid information that can be obtained, the impact on subsequent attacks, and whether trust is established [

The connection level (CL), which indicates the importance of the node’s associations in the social engineering system, is determined by the stage of the social engineering model in which the node is located and the number of other nodes connected to this node.

The above values of the physical attributes and target attributes are set by the system administrator and mapped to a vector

In the above equation,

Attack resource consumption (AR): the consumption of attack resources in the attack preparation phase, the attack implementation phase, target information collection, scripting, trust building, and other actions that consume time [

Defense resource (DR) consumption: resource consumption in resisting social engineering attacks; time consumed in preventing attackers from obtaining protected information, detecting attacks, and identifying attacks for information collection [

Loss recovery consumption (LR): the time consumed in recovering from the loss caused by the attack, such as replacing a secret key, changing a password, or taking other actions to protect one’s property and information [

In the human vulnerability-based social engineering model, an attacker can use human vulnerabilities to perform social engineering attacks and obtain sensitive information. For the attacker, the greater the vulnerability found, the greater the harm and the greater the social engineering gain. Defense against social engineering focuses on the corresponding social engineering defenses for detected social engineering attacks. Ideally, the target has no exploitable vulnerabilities and is safe. However, in practice, the target attributes vary, the vulnerability performance ranges, the attacker can always find vulnerabilities to attack, and the defenses can lag and collapse when an attack occurs. The attacker and the defender are similar to the two sides of a game; the attacker tries to obtain the maximum reward, and the defender pursues the minimum loss. The two sides of the attacker and defender can be considered a stochastic game, and a stochastic game model can be used to analyze the best defense strategy for the defender. This paper design a new reward quantification method to quantify the role of vulnerability in the model, considering the properties of the target and the interaction between the vulnerability and the attack. The social engineering attack model is shown in

The structure of the social engineering system model is complex, with multiple stages. At each stage, the attacker does not have access to information about the entire system and takes random actions based on the information obtained at this stage. At the same time, neither the attacker nor the defender has access to the game information of the adversary or the gain of each action. This paper describes the game between the attacker and the defender in the social engineering system as a stochastic game model with incomplete information. The model is defined as follows.

where

The attacker’s utility consists of the attack gain and the period of the next state, and it can be expressed as

In the above equation,

Here, the SESM is an attribute-based social engineering security level indicator, and the SESM mainly reflects the node’s importance.

By formalizing the representations of the social engineering attack participants, the entire defined social engineering attack process is constructed as a game model

The utility function in the entire social engineering system, mainly considering the interaction between the target vulnerability and attacker, is quantified by the attack utility function

In social engineering defense strategies, traditional approaches use Q-learning. This is because Q-learning algorithms converge quickly and can compute optimal policies. This method is widely used; however, as the system’s complexity increases, the system’s unstable and dynamic nature leads to an increase in the convergence time of the Q-learning method. Researchers have proposed relevant solutions combined with deep learning to improve the convergence speed of Q-learning algorithms. However, these solutions require a large amount of computation and often do not guarantee the algorithm’s convergence in computing the optimal policy. The traditional formulation of the Q algorithm can be expressed as

where s’ is the next state,

Proof:

According to

In order to prove convergence, it is necessary to prove that the sequence

According to π(s) and π(sʹ), we have

If π (s) Q (s) − π (s^ʹ) Q^* (s^') ≥ 0

According to the above proof, the model is convergent, where t he sequence

The Q-value update mechanism is also optimized to further improve the computation speed. Different learning processes can be updated simultaneously in parallel for the current state of the Q-value. There is no need to update after learning. All Q-learning procedures are performed simultaneously and synchronously to update the values, using the updated Q-values to update the previous Q-values. This is faster and more effective; the algorithm is shown in Algorithm 1.

According to Algorithm 1, this paper quantifies two global q variables,

In this chapter, we construct a simplenatural social engineering system and analyze the simulation results.

The social engineering system mainly simulates phishing attacks, and the ’ystem’s architecture is shown in

The values of IL, CL, and TA for all nodes are shown in

Nodes | IL | AL | DS |
---|---|---|---|

Information research N1 | 2 | 1 | 2 |

Attack path N2 | 1 | 2 | 1 |

Trust building N3 | 2 | 3 | 3 |

Manipulating target behavior N4 | 3 | 2 | 3 |

The social engineering attacker, through the collection of target information, discovers the target attributes, finds the arget’s vulnerability, and executes an attack on the target, and the purpose of the attack is to obtain sensitive information or goods. The Common Attack Pattern Enumeration and Classification (CAPEC) database, developed and maintained by MITRE, records known cyber attack patterns [

Vulnerability no. | CAPEC ID | Description |
---|---|---|

V1 | CAPEC 118 | Obtain target information |

V2 | CAPEC 98 | Detecting vulnerabilities |

V3 | CAPEC 427 | Target properties |

V4 | CAPEC 416 | Manipulate target behavior |

V5 | CAPEC 173 | Action spoofing |

V6 | CAPEC 151 | Identity spoofing |

V7 | CAPEC 137 | Parameter injection |

States | Description |
---|---|

S1 | Original state |

S2 | Obtain target information |

S3 | Send social engineering attacks |

S4 | Gain the trust of the target |

S5 | Execute attacker operations |

Action | Description |
---|---|

a1 | Utilizing V1 on N1 |

a2 | Utilizing V3 on N1 |

a3 | Utilizing V7 on N2 |

a4 | Utilizing V6 on N2 |

a5 | Utilizing V3 on N3 |

a6 | Utilizing V5 on N3 |

a7 | Utilizing V5 on N4 |

a8 | Utilizing V4 on N4 |

Action | Description |
---|---|

d1 | Check personal information exposure. |

d2 | Robust cybersecurity training |

d3 | Implement an audit log written to a separate host, validated before use |

d4 | Authentication processes, multifactor authentication |

d5 | Cybersecurity training |

d6 | Avoid clicking suspicious links; robust cybersecurity training |

d7 | Robust cybersecurity training |

d8 | Robust cybersecurity training |

The corresponding sets of defense and attack strategies are shown in

In this paper, we introduce the concepts of players and strategies and construct a model of reinforcement learning using players and strategies with a custom nature. In the game model of this paper, there are two players, and each player learns a new strategy. Each player’s strategy is formulated according to the player’s current state and the actions taken. The player’s probability is updated by observing the opponent’s behavior and changing the action accordingly. The best behavioral strategy is finally acquired through reinforcement learning for maximum benefit. The model in this paper has five states, and the reward of each state is shown in

States | Attacker rewards | Defender rewards |
---|---|---|

S1 | 8.32 | −8.32 |

S2 | 7.14 | −7.14 |

S3 | 12.23 | −12.23 |

S4 | 9.18 | −9.18 |

S5 | 20 | −20 |

States | Attacker strategies | Defender strategies |
---|---|---|

S1 | {0.32, 0.21, 0.47} | {0.55, 0.24, 0.21} |

S2 | {0.40, 0.60} | {0.34, 0.33, 0.33} |

S3 | {0.40, 0.60} | {0.33, 0.34, 0.33} |

S4 | {0.35,0.65} | {0.46,0.54} |

S5 | {1,0} | {0.81,0.19} |

In this paper, we propose an improved Q-learning algorithm that utilizes parallel computing to improve the efficiency of the computation. We conducted experiments using 1, 10, 20, and 30 parallel learning states to verify the relationship between multiple parallel learning processes and decision-makingsimilar. The results are shown in

According to the literature survey, few studies apply game theory and reinforcement learning algorithms to the problem of social engineering defense strategy generation problem. Using different performance indicators and characteristics to compare and analyze the algorithm in this paper cannot prove the performance of the algorithm. Two commonly used reinforcement learning algorithms QL algorithm, and DQN algorithm [

In this paper, the simulation experiment of social engineering defense strategy generation is carried out in the same network simulation environment, and the statistical analysis of the experimental data is carried out. The experimental results show that the simulation results obtained using Q-learning, DQN, and the algorithm in this paper in the social engineering random game scenario are shown in

The above results show that in the social engineering random game scenario, the defender rewards obtained by the IQL solution are significantly greater than those obtained by the Q-Learning and DQN solutions. This experiment shows that defenders can reach their optimal defense strategy faster under the IQL reinforcement learning mechanism than other deep learning algorithms.

In a single iteration, the IQL algorithm can update different irrelevant states simultaneously, which can be considered an independent process. Therefore, the decision time of the IQL algorithm is always shorter than that of the QL algorithm. To prove this conclusion, we set up comparison experiments to calculate the policy learning time using the QL algorithm, the DQN algorithm, and the algorithm in this paper with different greedy values. The experimental results are shown in

This is because the more states there are in the iterative process, the more computation is required and the higher the computational complexity. In general, the best performing iteration over the least is when e-0.04, and the completion time of the IQL algorithm can be reduced by 9.1% and 19.4% compared to the DQN and QL algorithms. The calculation formula is as follows:

In the above equation,

In this paper, we propose a reinforcement learning model based on game theory that can generate optimal social engineering defense strategies for social engineering attack models to enhance social engineering defenses and reduce losses. Since the traditional methods of social engineering defense strategy generation consider only the technical aspects of defense, humans are idealized and regarded as having consistent properties, which leads to an unsatisfactory defense strategy. Considering the interaction between target vulnerabilities and social engineering attacks, a quantification mechanism based on multiple target attributes is proposed. The attacker and defender are also modeled as a two-sided stochastic game. The optimal defense strategies of the defender are analyzed. To improve the real-time performance and effectiveness of the defense, a Q-learning algorithm based on the game is optimized, and a multistate independent parallel learning optimization method is proposed to improve the learning efficiency and to generate the optimal defense strategy quickly. According to the experimental simulation results, the average time needed to create the optimal policy is reduced by 12.5%~20% with the optimization method proposed in this paper compared with the QL and DQN algorithms. However, there are still some significant problems for the process in this paper; for example, the model construction could be more rough and sufficiently detailed for parallel task scheduling algorithm research, and the attack recovery method presented here could be improved.

Thanks are due to Kangfeng Zheng and Chunhua Wu for assistance with the experiments and to Xiujuan Wang and Bin Wu for valuable discussion. We are grateful for the assistance of the Beijing Natural Science Foundation (4202002).

This research was funded by the Beijing Natural Science Foundation (4202002).

The authors declare that they have no conflicts of interest to report regarding the present study.