With the popularity of online payment, how to perform credit card fraud detection more accurately has also become a hot issue. And with the emergence of the adaptive boosting algorithm (Adaboost), credit card fraud detection has started to use this method in large numbers, but the traditional Adaboost is prone to overfitting in the presence of noisy samples. Therefore, in order to alleviate this phenomenon, this paper proposes a new idea: using the number of consecutive sample misclassifications to determine the noisy samples, while constructing a penalty factor to reconstruct the sample weight assignment. Firstly, the theoretical analysis shows that the traditional Adaboost method is overfitting in a noisy training set, which leads to the degradation of classification accuracy. To this end, the penalty factor constructed by the number of consecutive misclassifications of samples is used to reconstruct the sample weight assignment to prevent the classifier from over-focusing on noisy samples, and its reasonableness is demonstrated. Then, by comparing the penalty strength of the three different penalty factors proposed in this paper, a more reasonable penalty factor is selected. Meanwhile, in order to make the constructed model more in line with the actual requirements on training time consumption, the Adaboost algorithm with adaptive weight trimming (AWTAdaboost) is used in this paper, so the penalty factor-based AWTAdaboost (PF_AWTAdaboost) is finally obtained. Finally, PF_AWTAdaboost is experimentally validated against other traditional machine learning algorithms on credit card fraud datasets and other datasets. The results show that the PF_AWTAdaboost method has better performance, including detection accuracy, model recall and robustness, than other methods on the credit card fraud dataset. And the PF_AWTAdaboost method also shows excellent generalization performance on other datasets. From the experimental results, it is shown that the PF_AWTAdaboost algorithm has better classification performance.

With the rise of the electronic payment era, more and more people are using credit cards to make purchases and transfers. There is no doubt that electronic payment has brought great convenience to people’s daily life and work, but at the same time, the risk of theft of users’ personal information is also increasing, leading to an increase in credit card fraud cases year by year. Therefore, the prevention of credit card fraud has become one of the hot topics of discussion in academia and industry: A deep learning (DL) based problem-solving method for text data has been developed using Kaggle dataset, using an inverse frequency method to input images into CNN structure with class weights to solve class imbalance problem, while applying DL and machine learning (ML) methods to verify the robustness and effectiveness of their system [

Boosting is an important class of machine learning algorithms, and his basic idea is to form a strong classifier by integrating a series of weak classifiers together according to different weights [

At the same time, considering the large number of samples trained in the actual credit card fraud, it will lead to the long processing time of traditional Adaboost, so it is not appropriate to apply Adaboost directly to the credit card fraud scenario. For the time-consuming improvement of Adaboost algorithm, the research direction is mainly through pruning operation to screen out the data with little value, such as static weight trimming adaboost (SWTAdaboost) [

Systematically analyzed the drawbacks of the traditional Adaboost algorithm in the presence of noisy samples, and proposed a method to optimize the algorithm by constructing penalty factors with the number of successive misclassifications of samples.

By comparing the penalty strength of the three types of penalty factors constructed in this paper, the best penalty factor is determined. It is then introduced into the AWTAdaboost algorithm to obtain the final optimization algorithm. Final application to credit card fraud detection scenario.

The rest of the paper is organized as follows: chapter 2 provides a theoretical analysis and selection of the introduced penalty factors, then introduces our improved algorithm--the PF_AWTAdaboost algorithm, chapter 3 designs experiments on datasets such as credit card fraud to compare with other algorithms, and finally draws conclusions in chapter 4.

This section first proposes the concept of penalty factor by analyzing the traditional Adaboost algorithm, and compares the three nonlinear penalty functions proposed in this paper, then migrates the penalty factor to the AWTAdaboost algorithm, introduces the PF_AWTAdaboost algorithm process, and finally analyzes the convergence of the AWTAdaboost algorithm.

Obviously, in the traditional Adaboost algorithm, if there is noise in the training set, the weight of noisy samples that are difficult to classify correctly will increase with the number of iterations, which will make the base classifier pay too much attention to the noisy samples and thus make a wrong decision, leading to the degradation of the performance of the final strong classifier. Therefore, reducing the weight of noisy samples becomes a mainstream direction for improvement, and this paper proposes to use the number of consecutive misclassifications to distinguish normal samples from noisy samples, because noisy samples are more difficult to classify correctly than normal samples, and the number of misclassifications of noisy samples in the process of iteration is definitely more than the number of misclassifications of normal samples, but in order to avoid treating the occasional misclassified normal samples as noise values. We choose the number of consecutive misclassifications of samples to minimize the misclassification cases. On this basis, we establish the penalty factor A(e), where e is the number of consecutive misclassifications, and A(e) decreases as e grows. After introducing the penalty factor,

(When the classification is correct, let

By comparison, it is found that the weight of noisy samples under

With the introduction of the penalty factor

Since A(e)

Thus making

Then by comparing

Therefore, in the final classification decision, the classifier will not overlearn noisy samples, while classifiers with lower error rates will receive greater weights in the Adaboost algorithm with the introduction of penalty factors than under the traditional Adaboost algorithm.

Regarding the selection of A(e), three nonlinear continuous penalty functions are proposed under the constraints proposed in this paper. Are

The following analysis shows whether the error rate of the AWTAdaboost algorithm still meets the requirements after introducing the penalty factor.

The error rate of the original AWTAdaboost algorithm is

Let

With the introduction of the penalty factor.

Due to

Combining

And because by the definition of the sample distribution there is

Also according to

According to

Because

Compare

The final error rate of PF_AWTAdaboost algorithm on the training set can be obtained according to

Therefore, it can be finally concluded that the error rate of PF_AWTAdaboost algorithm has an upper bound on the training set, and the upper bound on the error rate decreases exponentially when the number of iterations increases, so the AWTAdaboost algorithm still converges after the penalty factor is introduced.

The dataset used in this paper is the credit card fraud dataset provided by the kaggle platform, which contains transactions made by European cardholders via credit cards in September 2013, showing transactions that occurred over a two-day period. There were 492 fraudulent transactions out of 284, 807 transactions. The dataset has been processed by PCA and the details are shown in

Features | Feature description |
---|---|

V1-V28 | Principal components obtained using PCA |

Time | Contains the number of seconds elapsed between each transaction and the first transaction in the dataset |

Amount | Transaction amount |

Class | Takes the value of 1 in case of fraud, 0 otherwise |

Precision and recall are often used as metrics for algorithm performance evaluation when exploring the performance of binary classification algorithms. We divide the class of actual sample value and the class of classifier prediction as follows: when the actual sample value is a positive case and the classifier predicts a positive case as a true case TP; when the actual sample value is a negative case and the classifier predicts a positive case as a false positive case FP; when the actual sample value is a negative case and the classifier predicts a negative case as a true negative case TN; and when the actual sample value is a positive case and the classifier predicts a negative case as a false negative case FN. This defines the precision rate

In the problem of credit card fraud detection, it is the minority class of samples that is of concern. Therefore, it is very important to identify the few fraudulent transactions or users with high accuracy to avoid financial losses. The traditional classification criteria may focus more on the majority class samples, and the accuracy rate is still high even if all the minority class samples are incorrectly predicted, so the traditional classification metrics are not applicable to the imbalanced classification problem. In order to select metrics for more comprehensive evaluation of classifiers, scholars have summarized and proposed two evaluation criteria for unbalanced classification problems -- F-meature, ROC (Receiver Operating characteristic).

F-meature is an evaluation criterion that combines precision and recall, which is defined as.

ROC is a graph with FP/(FP + TN) (false positive case rate) as the horizontal axis and TP/(TP + FN) (true case rate) as the vertical axis, which indicates the change of false positive case rate and true case rate when the threshold value is changed, and when the ROC curve is closer to the upper left corner, it means that the classifier gets higher true case rate with lower false positive case rate. However, the ROC curve only reflects the change of false positive rate and true rate, and cannot be used to evaluate the classifier quantitatively [

In the PF_AWTAdaboost algorithm model training, there are two independent parameters: the sample trimming threshold k, and the penalty factor c. Since the ln-type penalty factor is selected in this paper, c at this point denotes the true number. First some experiments are performed to find the optimal values of these parameters.

Here we choose 40 classifiers and select k from 5 to 10 for the experiments. Since the k value will determine the number of cropped samples and thus affect the operation time and classification, this paper selects the one with better effect by observing the F1 value of each k value on the test set. The results of the runs for different k values are shown in

The F1 value is highest for k = 6 and k = 9. However, considering that the higher the value of k, the higher the number of cropped samples, the more likely it is to cause decision errors, k = 9 is discarded and k = 6 is chosen for the experiment.

The penalty factor selected in this paper is

From

From

The highest F1 value of PF_AWTAdaboost algorithm is 0.9421 which is 0.0216 higher than the traditional Adaboost algorithm, 0.0121 higher than the AWTAdaboost algorithm, and 0.1146 higher compared to the SVM algorithm. The highest AUC value of PF_AWTAdaboost algorithm is 0.9910 which is 0.0027 higher than the traditional Adaboost algorithm, 0.0010 higher than the AWTAdaboost algorithm and 0.276 higher than the SVM algorithm.

By comparing the F1 and AUC values of each algorithm, it can be found that the comprehensive performance of the classifier model trained by PF_AWTAdaboost algorithm is better than the traditional Adaboost algorithm and AWTAdaboost algorithm in credit card fraud problem.

In order to verify the high universality of the algorithm proposed in this paper, the algorithms were tested on Horse, Wisconsin, Breast cancer, Adult, custom datasets, and 10 datasets selected from the kaggle platform. The custom dataset was created to test the gradient ascent algorithm, which has a smaller sample size (only 100 samples in both the training and test sets) and is more prone to classification errors than the other datasets. The same experiments as before were performed on these datasets, and the best values are bolded.

Dataset | SVM | Adaboost | Awtadaboost | PF_Awtadaboost |
---|---|---|---|---|

Network_ads | 0.766 | 0.910 | 0.921 | |

Healthcare-dataset | 0.784 | 0.868 | 0.868 | |

Binary-classification | 0.642 | 0.639 | 0.646 | |

Credit_card | 0.538 | 0.944 | 0.945 | |

Airline passenger satisfaction | 0.531 | 0.966 | 0.967 | |

Heart | 0.600 | 0.863 | 0.866 | |

Heart2 | 0.659 | 0.853 | 0.854 | |

Water | 0.530 | 0.564 | 0.564 | |

Titanic | 0.802 | 0.889 | 0.889 | |

Banking-dataset | 0.801 | 0.871 | ||

Horse | 0.709 | 0.793 | 0.778 | |

Wisconsin | 0.990 | 0.990 | 0.990 | |

Breast cancer | 0.800 | 0.765 | 0.814 | |

Customization | 0.832 | 0.829 | 0.848 | |

Adult | 0.593 | 0.870 | 0.869 |

Dataset | SVM | Adaboost | Awtadaboost | PF_Awtadaboost |
---|---|---|---|---|

Network_ads | 0.404 | 0.756 | 0.787 | |

Healthcare-dataset | 0.437 | 0.626 | 0.675 | |

Binary-classification | 0.622 | 0.537 | 0.559 | |

Credit_card | 0.612 | 0.798 | ||

Airline passenger satisfaction | 0.636 | 0.921 | ||

Heart | 0.308 | 0.732 | 0.722 | |

Heart2 | 0.623 | 0.882 | 0.882 | |

Water | 0.260 | 0.260 | 0.129 | |

Titanic | 0.508 | |||

Banking-dataset | 0.734 | 0.821 | ||

Horse | 0.444 | 0.828 | 0.833 | |

Wisconsin | 0.788 | 0.935 | ||

Breast cancer | 0.400 | 0.667 | 0.645 | |

Customization | 0.750 | 0.750 | 0.772 | |

Adult | 0.665 | 0.774 | 0.776 |

Dataset | SVM | Adaboost | Awtadaboost | PF_Awtadaboost |
---|---|---|---|---|

Network_ads | 0.833 | 0.923 | 0.962 | |

Healthcare-dataset | 0.531 | 0.548 | 0.568 | |

Binary-classification | 0.793 | 0.821 | 0.857 | |

Credit_card | 0.441 | 0.876 | 0.876 | |

Airline passenger satisfaction | 0.516 | 0.906 | 0.909 | |

Heart | 0.571 | 0.757 | 0.765 | |

Heart2 | 0.821 | 0.873 | 0.873 | |

Water | 0.414 | 0.531 | 0.531 | |

Titanic | 0.778 | 0.778 | 0.778 | |

Banking-dataset | 0.791 | 0.866 | 0.864 | |

Horse | 0.824 | 0.878 | 0.880 | |

Wisconsin | 0.952 | 0.935 | 0.938 | |

Breast cancer | 0.750 | 0.625 | 0.714 | |

Customization | 0.763 | 0.780 | ||

Adult | 0.502 | 0.846 | 0.841 |

From

Therefore, combining these three tables shows that:In these 15 datasets, compared with the other three algorithms, the PF_AWTAdaboost algorithm shows superior generalization ability, especially in improving the AUC value. Therefore, it can be proved that the improvement proposed in this paper is reasonable under different scenarios.

In order to reduce the impact of noisy samples on the classification performance of Adaboost algorithm in credit card fraud scenarios, this paper proposes a new method to reduce the impact of noisy samples by determining the number of consecutive misclassifications of samples and constructing penalty factors to change the original Adaboost sample weight assignment. By comparing the penalty strength of three different types of penalty factors proposed in this paper, the best penalty factor is selected. Then the penalty factors were migrated to AWTAdaboost to form PF_AWTAdaboost, which was verified to be still convergent by formula derivation, and finally PF_AWTAdaboost was compared with other three traditional machine learning algorithms in credit card fraud dataset and other datasets respectively. The test results in the credit card fraud dataset show that the F1 and AUC values of PF_AWTAdaboost algorithm are higher than the other algorithms, with an improvement of 0.0121 and 0.0010, respectively, compared to the AWTAdaboost algorithm, and 0.0216 and 0.0027, respectively, compared to the Adaboost algorithm. The PF_AWTAdaboost algorithm also shows excellent generalization performance in UCI dataset and kaggle dataset, which verifies that the proposed improved method is advantageous in both credit card fraud scenarios and other scenarios.