Fraud Transactions are haunting the economy of many individuals with several factors across the globe. This research focuses on developing a mechanism by integrating various optimized machinelearning algorithms to ensure the security and integrity of digital transactions. This research proposes a novel methodology through three stages. Firstly, Synthetic Minority Oversampling Technique (SMOTE) is applied to get balanced data. Secondly, SMOTE is fed to the natureinspired Meta Heuristic (MH) algorithm, namely Binary Harris Hawks Optimization (BinHHO), Binary Aquila Optimization (BAO), and Binary Grey Wolf Optimization (BGWO), for feature selection. BinHHO has performed well when compared with the other two. Thirdly, features from BinHHO are fed to the supervised learning algorithms to classify the transactions such as fraud and nonfraud. The efficiency of BinHHO is analyzed with other popular MH algorithms. The BinHHO has achieved the highest accuracy of 99.95% and demonstrates a more significant positive effect on the performance of the proposed model.
Digital fraud is becoming more common due to the rising use of electronic cards for online and general purchases in ebanking. Most online frauds have rapidly routed to mobile and Internet channels. Bank card enrolment through smart mobile devices has become the main focus for fraud efforts which is the initial step in mobile transactions. Furthermore, fraudsters instantly change their techniques to avoid being noticed. As per data provided by the Reserve Bank of India (RBI), the number of fraud cases noted 4,071 by Indian lenders from April to September 2021, summed up to Rs 36,342 crore. As a result, the regulator is steadily touching the public to raise awareness about financial fraud. Researchers now concentrate more on fraudster activities and improve their techniques in a more advanced way. However, most algorithms are still incapable of solving all problems, necessitating the continued efforts of scientists or engineers to find more capable algorithms [
Many MH algorithms have been proposed in the literature to analyze the Feature Selection (FS) problem in various applications. MH algorithms are classified based on utilization of search space such as one & various neighbourhood structures, natureinspired & nonnatured inspired, dynamic & static objective functions, population & singlepoint based search, etc. Each of these algorithms has advantages and disadvantages in handling a specific problem. The following metrics must converge with any metaheuristic model [
The exploration and exploitation capabilities of an algorithm determine its performance.
Hybridized algorithms are a combination of metaheuristics. The advantage of hybridization is to solve problems that occur in one method are addressed with another method.
Metaheuristics, unlike optimization algorithms, iterative methods, and simple greedy heuristics, can frequently find results with less computational effort regarding speed and convergence rate.
Metaheuristics algorithms demonstrate their robustness and are fed to a classifier.
The rest of this paper is written as follows. Firstly, Section 2 describes the digital transactional fraud detection mechanisms, i.e. existing works. Section 3 introduces the main essential concepts of feature selection techniques. Results are given in Section 4. Finally, the paper ended with a conclusion and future scope.
The amount of data available has grown in recent years using methods that make data collection feasible from several fields. It can increase computational complexity (space and time) while executing machine and deep learning algorithms. It reduces the algorithm’s efficiency when data has more redundancy. In classification tasks, irrelevant data reduces accuracy and performance significantly. Therefore, FS has secured prominence in the scientific area for advanced years. Advanced heuristics offer a collection of methods for creating heuristic optimization techniques. MH can be expressed as naturebehavioral and nonnaturebehavioral. Again, naturebehavioral algorithms are derived into four categories such as evolutionary algorithms, swarmbased algorithms, physicsbased algorithms, and humanbased algorithms. This literature segment gives a broad survey of the most useful MH algorithms and highlights some of their areas in recent works.
Some hybrid algorithms have been reported to outperform native algorithms in feature selection. Zhang et al. [
Despite differences in optimization algorithms in MH, the process of optimization has been formulated into dual stages: Exploration (EXPO) and Exploitation (EXPL). These stages give broad coverage and analysis of the search region that is achieved by various finding solutions of the algorithm to solve searching and hunting problems, as mentioned in
Type  Author  Year  Optimization technique  Inspired by 

Evolutionary  [ 
2019  Genetic Algorithms  Darwinian theory of evolution 
[ 
2020  Genetic Programming  Charles Darwin's theory of natural evolution  
[ 
2019  Differential Evolution  the natural phenomenon of evolution  
[ 
2008  BiogeographyBased Optimizer  Biogeography related to species migration  
[ 
2014  ProbabilityBased Incremental Learning  The genotype of a whole population (probability vector)  
Swarm intelligence  [ 
2021  Particle Swarm Optimization  The natural behaviors of swarm particles 
[ 
2005  Ant Colony Optimization  Ants deposit pheromones on the ground  
[ 
2015  Moth Flame Optimization  The moth’s navigation method in nature  
[ 
2022  Harris Hawks optimization  The behaviour of harris hawks in nature  
[ 
2021  Aquila Optimization  Aquila's behaviors in nature  
[ 
2020  Mayfly Optimization algorithm  the flight and mating behavior of adult mayflies.  
[ 
2021  Jellyfish Algorithm  The behavior of jellyfish in the ocean  
[ 
2018  Whale Optimization  The behavior of humpback whales  
[ 
2018  Grey Wolf Optimization  Inspired by grey wolves  
[ 
2019  Henry Gas solubility optimization  The behavior of Henry’s law  
[ 
2020  Hide objects game optimization  Game to find a hidden object  
Physicsbased  [ 
2011  GalaxyBased Search Algorithm  The spiral arm of spiral galaxies to search 
[ 
2018  Gravitational Local Search  The law of gravity and mass interactions  
[ 
2012  Charged System Search  Principles from physics and mechanics  
Humanbased  [ 
2012  Teaching based learning  The influence of a teacher on the output of learners 
[ 
2018  Socio Evolution\& Learning Optimization  Social learning behavior of humans  
[ 
2011  Brain storm optimization  the brainstorming process  
[ 
2019  Poor\& Rich optimization algorithm  the rich to achieve wealth and improve their economic situation.  
[ 
2021  GainingSharing Knowledgebased Algorithm  The philosophy of gaining and sharing knowledge during the human life span.  
Hybrid  [ 
2013  PSO+GA  Particle Swarm Optimization + Genetic Algorithm 
[ 
2018  HFPSO  Firefly + Particle Swarm Optimization  
[ 
2020  HHOSA  Harris Hawks + Simulated annealing  
[ 
2021  GWOHHO  Grey wolf + Harris Hawks  
[ 
2022  AOAAO  Aquila + Arithmetic optimization 
The proposed method is implemented in various phases. Firstly, the raw transactional data are preprocessed using Synthetic Minority Oversampling Technique (SMOTE). Optimization methods are used to select the perfect optimal reduced subset of features depending on the fit function. After that, conventional supervised learning models (KNN & SVM) depicts the performance of the metaheuristic feature selection method. The proposed approach is shown in
The digital transaction fraud detection datasets from Kaggle and UCI repositories consist of anonymized credit card and payment transactions labelled as fraudulent or genuine. The two benchmark datasets, DTS1 [
Dataset/Description  DTS1  DTS2 

Payment type dataset  European credit card dataset  
#Samples  1048573  284807 
#Features  10  31 
#NonFrauds  1047432  284315 
#Frauds  1141  492 
#Reduced features  8  10 
#Balanced samples  2047570  442268 
Different experiments are done using realworld unique data sets to assess the model performance. The credit card fraud data has been accessed from the UCI machine learning repository. It is a highly imbalanced dataset. That means it gives less accuracy in classification. Therefore, it has used oversampling technique SMOTE to balance the data. SMOTE [
MH algorithms started their enhancement process (optimization) with randomly generated candidate solutions. The produced collection of solutions is enhanced with a set of rules in the optimization process and repeatedly calculated by a defined objective function. Primarily, populationbased methods seek the optimal solution to optimize the problems in a stochastic manner. So, obtaining a solution in one iteration is not guaranteed. Nonetheless, a fine collection of random solutions and repetition of the optimization process increase the probability of obtaining the optimal global solution for the specified problem. The details (Input & Parameter setting) of specified MH algorithms are depicted in
Optimization technique  Parameters  Value 

BGWO  # Wolves  10 
#Iterations  100  
BHHO  # Harris Hawks  10 
#Iterations  100  
BAO  # Aquila birds  10 
#Iterations  100  
F_Obj  Any benchmark function 
BAO [
1. X1 → Expanded Exploration (EEXPO) → High sour with a vertical stoop.
2. X2 → Narrowed Exploration (NEXPO) → Contour flight with short glide attack.
3. X3 → Expanded Exploitation (EEXPL) → Low flight with a slow descent attack.
4. X4 → Narrowed Exploitation (NEXPL) → walking and grabbing the prey.
BGWO is a metaheuristic proposed by [
Step1. Searching or Exploratory for the prey
Step2. Hunting, chasing, & proximate the prey
Step3. Following, encircling, and harassing the prey until it stops
Step4. Attacking or threatening the prey
The feature vector of size is 2N for different feature subsets, which enclose features within a large space that can be intensively explored. BGWO techniques are applied adaptively to examine the features in searching for optimal features. Hence, extracted features are merged with BGWO to evaluate the respective positions of the grey wolves as given [
BinHHO [
Harris Hawks are Intelligent Birds that track and detect their prey with powerful eyes. Here, harris hawk hunts randomly on specific sites and waits until it detects prey. If the prey is not found, the attacker waits, observes the situation, and monitors the site. The two strategies are followed by hawks to detect the prey. One is the position of other family members of harris. The second strategy is the target position. Harris hawk’s hunt randomly on specific sites and waits to detect prey. In the HHO algorithm, Harris Hawk is the best candidate solution in terms of prey. This phase expressed a mathematical model to alter the locality of the Harris Hawk in the search space by
The HHO EXPO process tends to distribute search agents across all desirable areas of search space while also enhancing the randomness in HHO.
During the HHO EXPL process, search agents can exploit the closest optimal solutions. HHO originates four mechanisms to model the EXPL phase based on the distinct hunting scenarios and prey’s appropriate action (rabbit).
Exploit the rabbit when the rabbit’s energy is low. It can easily exploit or attack the rabbit.
Rabbit/prey energy decreases while escaping from the attacker (Hawk).
Suppose the energy level of the rabbit is high, and the hawk is chasing the rabbit.
The energy level is decreasing for the rabbit means that the hawk is continuously changing his position to catch a rabbit.
Based on the mechanisms, defined operations are to be performed, like decreasing rabbit energy in the algorithm, updating the position in the search space, and how to perform the exploitation phase. The main peculiarity of the metaheuristic algorithm is that the exploitation phase can be done in four phases. It explores the maximum capacity in optimal feature selection. The mathematical model to calculate prey energy during escape is expressed in
The soft roundup depicts behavior using a given mathematical model [
In this case, Harris Hawk performs a sudden attack. The math illustration for this behavior:
Here, Harris Hawks encircle the rabbit softly and make it more tired before performing the unexpected attack. In this case of behavior, the mathematical model is
Here, Case 3 & 4 represents the intelligent behavior of harris hawks in the search space when trying to search and attack. The variable ‘r’ defines the prey that successfully escaped the attackers. So, the value for the ‘r’, i.e. random value.
If (r<0.5)
escaped before the attack.
If (r>0.5)
Not escaped before the attack.
Algorithm 1 is proficient in searching the binary search region. In BHHO, the hawk location is updated in multiple stages. The second step tells the Sshaped transfer function.
The Sshaped transfer function is T(X), a random number in the range [0,1] represented as rnd (0,1]. Then, ¬S is called S’s complement.
Initially, the HHO technique intends to tackle problems that requires continuous optimization. On the other hand, feature selection is equivalent to the binary problem, which means every solution is referred to as binary points zero and one [
The classifier parameters (i.e., Input) were adjusted using optimization algorithms to yield better classification accuracy. This process can be evaluated using popular supervised learning techniques called KNN and SVM. This work utilized three different distance functions, namely Minkowski Distance, Euclidean Distance, and The City block for KNN. These functions reflect the functionality of the KNN at different kvalues. Here, the k value varies with the nearest neighbors, which can be represented as k ∈ {3,5,7,9}. The functionality of the SVM classifier is mapped to the linear, polynomial, and RBF kernel functions. And the kernel scale parameter (σ) was changed stepwise, i.e. from 1 to 5. The experimental results for digital transactional fraud detection on benchmark datasets (in
Dist Fun/Metrics  BAO  BGWO  BHHO  

AC (%)  SE  SP  PR  F1S (%)  AC (%)  SE  SP  PR  F1S (%)  AC (%)  SE  SP  PR  F1S (%)  
Minkowski  K = 3  91.1  0.91  0.89  0.91  92.03  93.21  93.04  0.95  0.94  95.34  97.54  0.96  0.95  0.98  99.17 
K = 5  93.7  0.93  0.92  0.93  93.88  98.4  98.67  0.94  0.96  98.01  98.78  0.98  0.98  0.99  100  
K = 7  93.4  0.93  0.89  0.94  93.03  97.03  97.12  0.94  0.961  98.1  98.04  0.98  0.98  1  99.96  
K = 9  93  0.93  0.87  0.93  94  94.87  97  0.93  0.93  96.27  97.67  0.98  0.98  0.98  99.97  
Euclidean  K = 3  94.6  0.94  0.89  0.94  94.23  96.67  95.56  0.94  0.93  98.19  98.87  0.99  0.97  0.98  99.21 
K = 5  95  0.94  0.93  0.94  96.78  97.01  98.03  0.94  0.94  98.23  
K = 7  93.9  0.94  0.88  0.93  96.48  96.45  96  0.93  0.94  97.67  98  1  0.96  0.99  99.97  
K = 9  93  0.93  0.87  0.93  95.44  95.88  96.01  0.94  0.93  97.2  98.87  0.98  0.96  0.98  99.67  
Cityblock  K = 3  94  0.94  0.89  0.94  95.05  94.58  95  0.94  0.937  95.78  97.2  0.97  0.95  0.98  98.89 
K = 5  94.3  0.94  0.88  0.94  96.23  94.33  94.11  0.93  0.94  96.67  94.73  1  0.97  0.99  99.31  
K = 7  93.8  0.94  0.87  0.93  95.61  95.32  96.7  0.93  0.93  96.33  97.12  0.98  0.96  0.98  100  
K = 9  94.7  0.94  0.91  0.94  95.23  95  96.22  0.94  0.94  96.03  96.45  0.97  0.98  0.98  99.45 
Kernel Function/PM  BAO  BGWO  BHHO  

Kernel 
AC 
SE  SP  PR  F1S 
AC 
SE  SP  PR  F1S 
AC 
SE  SP  PR  F1S 

Linear  1  78.94  0.79  0.79  0.8  80.42  86.49  0.86  0.86  0.88  89.78  92.01  0.92  0.93  0.93  93.38 
2  79  0.79  0.79  0.7  82.84  86.93  0.87  0.87  0.88  89.84  93.48  0.93  0.94  0.93  94.82  
3  80.25  0.8  0.8  0.82  84.83  91.44  0.91  0.92  0.92  96.49  93.65  0.93  0.94  0.93  94.14  
4  80  0.8  0.8  0.81  80.26  89.93  0.9  0.92  0.91  95.21  90.32  0.90  0.92  0.92  91.38  
5  79.22  0.79  0.8  0.81  80  89.12  0.9  0.91  0.9  91.87  90.38  0.90  0.92  0.92  91.35  
RBF  1  85.67  0.86  0.88  0.84  91.29  90.46  0.9  0.9  0.91  93.28  92.85  0.93  0.92  0.91  93.87 
2  90.05  0.9  0.92  0.92  92.78  91.57  0.91  0.92  0.92  94.8  94.85  0.94  0.93  0.94  94.68  
3  88.94  0.89  0.9  0.91  91.98  91.43  0.91  0.92  0.92  94.12  
4  87.42  0.88  0.88  0.89  90.46  89.47  0.89  0.9  0.91  92.11  93.96  0.94  0.93  0.94  94.49  
5  87  0.87  0.89  0.89  90.12  90.37  0.9  0.91  0.9  90.06  93  0.93  0.92  0.94  93.89  
Polynomial  1  75.33  0.76  0.77  0.78  78.13  89.77  0.89  0.91  0.9  88.73  90.72  0.91  0.92  0.92  91.23 
2  77.84  0.77  0.78  0.79  80.73  88.72  0.88  0.91  0.9  88.01  91.93  0.91  0.93  0.94  94.12  
3  81.21  0.81  0.82  0.83  81.24  92.18  0.92  0.92  0.91  94.05  91.3  0.91  0.92  0.93  93.05  
4  79.62  0.79  0.8  0.8  80.99  89  0.9  0.91  0.9  91.21  87.29  0.88  0.91  0.92  92.78  
5  78.28  0.78  0.79  0.8  80.88  90  0.9  0.91  0.91  93.56  89.82  0.89  0.91  0.92  92 
Dist Fun/Metrics  BAO  BGWO  BHHO  

AC 
SE  SP  PR  F1S  AC 
SE  SP  PR  F1S  AC 
SE  SP  PR  F1S  
Minkowski  K = 3  92.4  0.93  0.94  0.94  94.33  94.56  0.94  0.95  0.95  94.89  95.04  0.95  0.95  0.96  98.51 
K = 5  94.7  0.94  0.95  0.95  94  95.02  0.95  0.95  0.969  95.78  99.23  0.98  0.98  0.96  97.88  
K = 7  95  0.95  0.96  0.96  95.45  94.89  0.95  0.96  0.96  94.34  98.12  0.98  0.98  0.98  98.76  
K = 9  94.7  0.94  0.95  0.96  95.46  94.33  0.94  0.95  0.93  94  98.02  0.98  0.97  0.97  97.91  
Euclidean  K = 3  94.9  0.95  0.96  0.95  96  95.67  0.95  0.96  0.94  95  99.03  0.99  0.98  0.98  99.05 
K = 5  96.3  0.96  0.96  0.96  96.77  96.08  0.96  0.96  0.97  95.31  
K = 7  95.5  0.95  0.95  0.96  95.89  97.02  0.97  0.97  0.964  94.65  98.99  0.99  0.98  0.99  99.72  
K = 9  95.1  0.95  0.96  0.96  96.22  96  0.96  0.96  0.96  94.77  98.67  0.99  0.99  0.98  98.51  
Cityblock  K = 3  93.3  0.94  0.95  0.95  95.99  95.34  0.96  0.96  0.93  95.89  97.38  0.98  0.98  0.98  98.56 
K = 5  94  0.94  0.96  0.97  96.89  95.9  0.96  0.96  0.95  96.93  98.09  1  0.98  0.99  99.07  
K = 7  94.2  0.94  0.95  0.96  96  96.43  0.96  0.95  0.95  96.32  96.46  0.98  0.98  0.99  99.16  
K = 9  93.6  0.94  0.95  0.96  95.04  96.34  0.96  0.95  0.94  96  97.89  0.99  0.98  0.99  98.69 
Kernel Function/PM  BAO  BGWO  BHHO  

Kernel scale  AC 
SE  SP  PR  F1S 
AC 
SE  SP  PR  F1S 
AC 
SE  SP  PR  F1S 

Linear  1  55.76  0.51  0.48  0.5  59.82  60.36  0.6  0.61  0.63  64.52  85.46  0.85  0.87  0.86  88.29 
2  64.03  0.65  0.60  0.61  68.29  72.24  0.73  0.72  0.75  75.04  91.95  0.91  0.92  0.93  92.89  
3  72.68  0.74  0.74  0.8  81.51  73.01  0.73  0.73  0.75  76.72  92.29  0.91  0.93  0.93  92.31  
4  63.4  0.59  0.62  0.64  69.54  60.15  0.6  0.61  0.63  58.98  89.3  0.89  0.92  0.91  90.96  
5  68.21  0.63  0.71  0.65  71.34  75.27  0.74  0.75  0.76  80.01  87.854  0.88  0.90  0.89  89.43  
RBF  1  69.24  0.7  0.53  0.69  72.31  76.37  0.76  0.76  0.79  81  92.56  0.92  0.92  0.93  94.56 
2  72.68  0.73  0.74  0.74  74.36  79.16  0.79  0.78  0.83  84.25  92.08  0.93  0.93  0.94  94.46  
3  71.87  0.73  0.73  0.74  73.60  79  0.79  0.79  0.82  84.04  
4  70.03  0.71  0.70  0.71  72.81  73.44  0.74  0.73  0.77  77  90.24  0.91  0.92  0.93  92.97  
5  70  0.71  0.70  0.7  72.67  78.35  0.78  0.79  0.8  83.12  90.22  0.91  0.91  0.93  0.92  
Polynomial  1  60.43  0.6  0.49  0.51  0.58  68.93  0.7  0.69  0.71  71.45  89.45  0.89  0.92  0.91  87.35 
2  63.05  0.63  0.54  0.6  0.60  82.38  0.82  0.823  0.85  90.11  89.45  0.89  0.92  0.91  87.12  
3  63.73  0.64  0.56  0.62  0.63  84.92  0.85  0.84  0.87  92.34  91.92  0.92  0.93  0.91  93.51  
4  0.59  0.6  0.57  0.61  0.62  79.27  0.8  0.81  0.81  84.98  90.77  0.91  0.93  0.91  92.13  
5  0.59  0.6  0.53  0.61  0.62  82.29  0.83  0.82  0.83  91.2  90.23  0.9  .0.92  0.92  92.89 
The results of the optimization techniques BAO, BGWO, and BinHHO have been applied to a KNN classifier that utilizes and implements various distance functions by differing K metrics, which is depicted in
All the specified optimization algorithms are evaluated by the fitness/convergence function. A fitness function influences the behavior of algorithms in the search space or explores test data to maximize a convergence metric, which can be considered an optimization problem. So,
In the dataset DTS1, the approach BinHHOSVM achieved well at RBF using three as kernel scale (), i.e., AC (%), SE, SP, PR, and F1S of 95.87%, 0.94%, 0.95%, 0.96%, and 96.28% respectively. Also, the dataset DTS2 had the results of AC (%), SE, SP, PR, and F1S (%) as 92.8%, 0.92%,0.93%, 0.94%, and 95.02%, respectively. The results which got better are shown in tables (in Bold Font). The wellworked classifier KNN achieved a maximum classification accuracy of 99.94% and 99.95% from the DTS1 and DTS2, respectively.
The proposed design exceeds the modern stateoftheart methodology using supervised machine learning techniques SVM and KNN. Worse performance was achieved by entropybased or nonlinear featurebased methods. The proposed model upgraded the accuracy by comparing it with other feature selection techniques.
This research presents a novel approach to classify fraud and nonfraud with feature selection in digital transaction fraud detection. The Synthetic Minority Oversampling Technique (SMOTE) was utilized for balancing the datasets and was tested to classify using a crossvalidation approach (means of 10fold). An intelligent way of identifying feature selection is determined using optimization techniques called BAO, BGWO, and BinHHO. BAO & BGWO have the drawback of a slow convergence rate and more time complexity than BinHHO. The Binary version of HHO (BinHHO) is a unique and efficient natureinspired swarmbased approach and has achieved a good accuracy with both datasets. The following limitations of the proposed work provide suggestions for future research:
This proposed approach is best suited for blockchain applications in banking and finance, and similar advanced fields.
Other feature selection methods to be studied.
Apply to any deep learning models.
The authors received no specific funding for this study.
The authors declare they have no conflicts of interest to report regarding the present study.