In the insurance sector, a massive volume of data is being generated on a daily basis due to a vast client base. Decision makers and business analysts emphasized that attaining new customers is costlier than retaining existing ones. The success of retention initiatives is determined not only by the accuracy of forecasting churners but also by the timing of the forecast. Previous works on churn forecast presented models for anticipating churn quarterly or monthly with an emphasis on customers’ static behavior. This paper’s objective is to calculate daily churn based on dynamic variations in client behavior. Training excellent models to further identify potential churning customers helps insurance companies make decisions to retain customers while also identifying areas for improvement. Thus, it is possible to identify and analyse clients who are likely to churn, allowing for a reduction in the cost of support and maintenance. Binary Golden Eagle Optimizer (BGEO) is used to select optimal features from the datasets in a preprocessing step. As a result, this research characterized the customer's daily behavior using various models such as RFM (Recency, Frequency, Monetary), Multivariate Time Series (MTS), Statisticsbased Model (SM), Survival analysis (SA), Deep learning (DL) based methodologies such as Recurrent Neural Network (RNN), Long ShortTerm Memory (LSTM), Gated Recurrent Unit (GRU), and Customized Extreme Learning Machine (CELM) are framed the problem of daily forecasting using this description. It can be concluded that all models produced better overall outcomes with only slight variations in performance measures. The proposed CELM outperforms all other models in terms of accuracy (96.4).
Client churn refers to the fall of customers in Customer Relationship Management (CRM). It is the occurrence of a customer discontinuing the avail of a product or service provided by the organization. Customer churn is named customer attrition. it is among the most critical issues that reduce a company’s profit. Business intelligence procedures for locating customers who want to change from one company to another company can be described as customer churn. In contrast, the retention rate assesses the retained clients [
Churn forecasting models will aid in the development of customer intervention strategies by identifying churners earlier. This is essentially a classification problem. Each customer must be classified as a potential churner or a nonchurner [
Controlling customer churn is one of the primary growth pillars of the financial service sector. Due to tough competition in the market, there is lots of choice from a number of service providers. The customers have the freedom to switch over to competitors due to many or even one bad experience. There is a lot of research work and modeling techniques performed to pursue features affecting customer churn, but in most cases, there is a less technical and specific enough approach to resolve the problem.
When customers are migrating, companies face not only the loss of revenue from those customers but also the high cost of acquiring new customers. No regardless of how excellent a firm’s product or service and it must keep track of its customer churn rate. Customers are the core of the business, so businesses must understand churn if they want to develop and adapt to meet the requirements of their buyers [
In the insurance industry, a customer is described as a churner when he stops doing incomegenerating actions for 30 days just after policy coverage expires. These days are referred to as the inactivity period. To develop a model to forecast churning, the customer's behavior during the time period preceding the idleness period, known as the observation period, is examined. In some words, some characteristics that could encompass the consumer during the monitoring time are transferred into a binary categorization method to be competent with the labels (churn, nonchurn) . This model's outcome can formally have illustrated as follows: ChurnScore (X, d) = p (class = Churner  X [d–30, d1]).
Here d is the present day and X is a multivariate time series data representing the client’s daily behavior during the period of observation. Furthermore, this method will recognize them as unsafe customers with a more churn score (>=0.9), indicating that no retention initiative should target them in
When this period of inactivity or behavioral change exceeds a certain threshold, the customer is considered a churned customer. The timeframe is the period that is specified as the threshold of the lack of activity date during such a process.
1) Addressing the problem of daily forecasting based on dynamic customer behavior on a daily basis.
2) Six approaches to daily churn prediction via customer feature selectionbased strategy and deep learning approaches.
3) Proposed CELM outperforms all other methods in terms of performance accuracy. The Customized word is instigating in the existing ELMs to initiate the Customized extreme learning machine to reduce structural risk (CELM).
4) Relate the effectiveness of these methods to prior monthly forecasts based on a big dataset obtained from the Health insurance dataset in Kaggle’s catalog, which demonstrated that our model outperformed previous representations in terms of forecasting churners earlier and further accurately.
This paper is organized as, Section 2 summarises the associated churn prediction work. Section 3 then proposes six models based on customer daily behaviour. Section 4 delves into the experimental findings. Section 5 concludes with conclusions.
Óskarsdóttir et al. (2018) [
Shah et al. [
S.no  Author  Method  Advanatges  Accuracy 

1  Raja et al. [ 
Variety of feature selection  Boost the accuracy of the builtin churn prediction design  72 
2  Ahmad et al. [ 
Logistic regression and KNearest Neighbour (KNN)  Forecast churn in the telecom business  88 
3  Fridrich et al. [ 
Random Forest (RF), Radial Bias Function (RBF)Support Vector Machine (SVM)  This learner efficient at incorporating characteristic selection  68 
4  Infante et al. [ 
Social network  Eliminate churners and nonchurners  82 
5  Amin et al. [ 
Customer churn prediction (CCP) technique  It empowers the distance factor  78 
6  Schena et al. [ 
CRM  Eliminates turnover before it occurs  65 
7  Stripling et al. [ 
Genetic algorithm (GA)  Offer thresholdindependent recall and accuracy  85 
8  Devriendt et al. [ 
Predictive frameworks  Achieved maximum profit metric  79 
9  Liu et al. [ 
Unstructured customer churn  Client segmentation  72 
10  Pustokhina et al. [ 
Sunflower optimization (SFO)  Efficient tuning in machine learning  83 
Early research on churn prediction concentrated on monthly forecasts of static or dynamic churnbased behavior. The primary drawbacks of these investigations can be obscured. First off, projections of monthly churn are too late since the monthly ideal would identify customers. who clearly left early in the prior month as churn in the following month. The second issue is that taking monthly performance into account ignores variations in customer behavior from day to day during the month, which might reduce the discriminative and predictive power of the predictive model.
Thus, the goal of this work is to forecast customer churn in the daily dynamic behavior of the health insurance market. Previous models mainly focus on the monthly churn rate and it is very difficult to identify churners in advance, to overcome these difficulties, this research includes six key approaches for predicting daily customer churn constructed on daywise client behavior. The proposed customized extreme learning machine (CELM) model shows higher accuracy than other models.
Insurance companies competed intensely for businesses all around the globe. Due to the obvious recent increase in the highly competitive environment of health insurance businesses, clients are migrating. It’s unclear if there is evidence of switching behavior and which customers go to a competitor. When there are so many different pieces of information recorded from lots of clients, it’s hard to study and comprehend the causes of a consumer’s choice to switch insurance carriers. In an industry where client retention is similarly important, with the earliest being the costlier method, insurance companies depend on the information to evaluate client behavior to minimize loss. As a result, Our Customized Extreme Learning Machine (CELM) model will be used to investigate customer churn detection for insurance data as shown in
The initial stage is to plan the dataset to realize proper input. Raw data is retrieved among all datasets in this phase, Various Data preprocessing techniques are used to transform the raw data into a useful and efficient format. Finally, valid data is supplied to the learning models.
Data dividing is indeed a method of dividing a database into at least 2 subgroups, referred to as ‘training’ (or ‘calibration’) and ‘testing’ (or ‘prediction’). The information that would be input into the model would be stored in the training dataset. The data used to evaluate the trained & verified strategy is contained in the testing dataset. It indicates how effective our overall model is and how frequently it would be to anticipate something that is incorrect. As mentioned above in our study, the training set consists of 33,908 data, whereas the testing set contains 11,303 data.
To handle the problem of feature selection, this research uses the Binary Golden Eagle Optimizer (BGEO). In the continuous domain, the Golden Eagle Optimizer (GEO) algorithm operates. Even so, feature selection is becoming a discrete problem, so the continuous space must be transformed into a finite interval. A transfer function was used to perform the transformation. To control both exploitation and exploration, timevarying flight lengths are also suggested. As previously stated, BGEO is a good method with fewer features to tune and improved results than existing algorithms. The adaptation of binary to continuous can be given as in
The RFM is a popular model in characterising client interactions and behavior overall. It is a straightforward but effective technique for computing client relationships in respect of RFM value.
1) RRecency is the time elapsed between the consumer’s latest acquisition and the data is collected.
2) The FFrequency of buying made by a single client within an indicated time period is made reference to.
3) The MMonetary variable denotes the quantity of cash spent by a client over a specific time.
RFM values for every temporal feature Fi of X were mined at 2 levels.
Entire investigation window 30 days long.
Successive alternate windows of the examination window, two each 15 days long.
Therefore, the entire RFM feature direction for temp features Fi can be demarcated in
When RFM actions are performed on client X in data D, a line path of 120 values is restored as relevant structures of the vendor’s temporal behaviour as during periods of statement. As a result, the transformed dataset D in
The converted dataset
This process yields a set D with an MTS of size r = 10 and length m = 30, i.e., Every assessment (client) X Є D in the set comprises a 10time series of length 30 that correlate to 10 temporal characteristic values in
Where
Rather than RFM parameters, this model summarises each behavioral trait using various summary analyses. Minimal level, highest, mean, variance, 1st quartile, 2nd quartile (median), 3rd quartile, skewness, and kurtosis are the statistical data. For each activity type, these values are computed at two levels. This means that we'll have a vector with 36 attributes for each behavior type over the entire search window (i.e., 30 days) and every 15 days thereafter Experimented with LR and RF classifiers, and the RF classification algorithm with hundred trees and max depth = 15 performed the finest. It can be achieved by combining the fresh values with the statistical feature vector before inputting them into the RF as shown in
Survival analysis is a collection of techniques aimed at assessing the time to the incidence of a particular event. It is also commonly used to estimate churning error, but is applicable to virtually anything as long as you have an observation of the measured time to the event. S(t) provides the probability that a subject will survive elsewhere time t. T is a continuous variable whose cumulative distribution function (CDF) is F(t). According to
In overall, any spreading can be employed to denote F(t), and the suitable distribution is regularly defined by event distribution domain expertise. The incremental, Weibull, lognormal, loglogistic, gamma, and exponentiallogarithmic distributions are all frequently used. In
The default distribution is represented by the cumulative hazard function, which can be represented in
Parts of
Churn prediction could be viewed as a component of SA. In churn forecast, time it is constant, and the primary objective is to decide whether or not the subject continues to leave earlier time t, which conforms to the definition of the equation.
The KaplanMeier modeller is a nonparametric estimate of the survival analysis function developed by Kaplan and Meier in 1958.
RNN allows the formation of cycles among hidden units. This allows the RNN to retain the inner state memory of earlier inputs, making it suitable for modelling sequential data [
LSTM architectures, developed by Hochreiter and Schmidhuber in 1997, are designed to learn longterm dependencies, allowing them to retain memory for extended periods of the period than classic RNNs. The distinction is in his LSTM cells. LSTM cells have some extra components and a more complex structure, which includes forget gates, input gates, and output gates. There are also cell state and hidden state that is modernized after each gate. The gate controls which facts are forgotten, the input gate controls the importance of the value to acquaint, and the output gate modernizes the cell's hidden layer.
The framework of the GRU cell is very related to that of the LSTM. There are only two aspects inside the GRU cell: the reset and update gates. The reset gate determines what data to forget and what data to add, whereas the reset gate influences what previous data to forget. In contrast to the LSTM cell, GRU does not have a cell state and instead employs the hidden state for identical determination. Training times are usually faster than LSTM due to its simpler structure.
ELM is a singlelayer feedforward neural network learning method (SLFN). The ELM model delivers excellent results at extremely fast learning speeds. The ELM does not employ a gradientbased method, unlike classical feedforward network learning algorithms such as the Backpropagation (BP) method shown in
1. Generate the input layer’s random weights matrix and bias. The weight matrix and bias have the dimensions (j x k) and (1 x k), correspondingly, where j is the no of hidden nodes and k is the no of input nodes in
2. Determine the hidden layer output vector. By multiplying X, which represents training data, by the transposed weight matrix, the original hidden layer output grid is determined in
3. Select an activation function. You can select any activation function you would like. However, in this example, I will use the function for sigmoid activation because it is simple to implement in
4. Determine the MoorePenrose pseudoinverse. There are several methods for calculating the MoorePenrose generalised inverse of H. Orthogonal extrapolation, orthogonalization method, variational iteration method, and decomposition of singular values are examples of these methods (SVD) in
5. Compute the output weight matrix beta
6. Step 2 should be reiterated for the testing set, creating a fresh H matrix.
Generate an outcome matrix named ŷ after that. To employ the wellknown beta matrix.
All parameters are adjusted only once due to noniterative training. This results in a rapid training rate. Its employment is simple to grasp and can be used to answer difficult problems. As a result, ELM drastically reduces working time and converts the unique nonlinear training problem into linear training troubles.
As per statistical learning, a learning model’s actual forecast risk is split into two components, structural risk (SR) and empirical error (ER). Considering, training ELMs with constant optimizers reduces only the ER, not the SR. The Customized word is presented in the previous ELMs to establish the Customized extreme learning machine to reduce structural risk. The less SR will improve the prediction rate of the learning model is shown in
Here C is the regularisation constraint, and the terms
Here R = (1, 2,…, N) denotes the operator. Starting to take the half results with respect to α, ε and β to
The output weight matrix is then obtained.
Where I denote the identity matrix
In this study, the research focussed on our experimental studies for endorsing the suggested everyday churn prediction models and trying to compare them to the different daily and monthly churn prediction approaches described in Section 2. Our concepts are specifically compared to the following approaches. Five measures were used for model evaluation and comparison: Area Under the Curve(AUC), log loss, accuracy, and F1 score. Furthermore, Lift denotes the churn rate with the highest predicted probability of 10% over the actual client base churn rate, and the Expected Maximum Profit Measure of the Churn rate (EMPC). The AUC has measured a more common metric as it précises the whole enactment with all possible limits. The EMPC is a new performance metric specifically planned to measure the performance of churn forecast methods given the cost and expected return of customer retaining campaigns. Default constraint values (α = 6; β = 14; CLV = 200; d = 10; f = 1) were used to calculate the EMPC metric. Tune the model hyperparameters to fit the grid search method. Hyperparameter space and optimal values These constraints for each model are shown in
HP values  

Model name  Explored hyperparameters’s values  The best hyperparameter’s values  
1) RFM  n_estimat = {10,40, 100, 150} 
n_estimat = 100, Maximum_depth = 10  
2) SBM  n_estimat = {10, 50, 100, 200} 
n_estimat = 100, 

3) SA  C = 1/2 for a useless model and 1 for a perfect model  C value = 1  
4) MTS  dim r = 10; and length n = 90  Time series of length = 90  
5) Deep learning models  RNN  No. of units  2 exp[(variety(3, 6)] 
GRU  No. of units  2 exp[(variety(5, 9)]  
LSTM  Number of units  range(1e4, 1e2)  
6) Proposed customized extreme learning machine.  Norton VALUE: [1, 1000] integers activation. func: sigmoid, sin, radial basis, hardlimit, symmetric hardlimit, satlins, tansigmoid, linear threshold values = 0.40, 0.50, 0.60, 0.70  Total number of models = 80000 (NN:584, AF:sin, TV:0.50) 
The mean and SD values for the five metrics of the model within the test forecast window are shown in
N = 11,303  Predicted: NO  Predicted: YES 

Actual: NO  250  50 
Actual: YES  3  11,000 
Model  Accuracy  F1 score  Log loss  Lift  EMPC 

1) RFM model  90.7  0.524  0.211  4.921  3.176 
2) Statistical based model  86.7  0.481  0.223  4.651  2.762 
3) Survival analysis  90.1  0.591  0.211  4.113  2.234 
4) MTS model  82.7  0.422  0.221  4.245  2.251 
5) DL models i) RNN  82.2  0.399  0.249  4.311  2.436 
ii) GRU  80.1  0.211  0.223  4.211  2.236 
iii) LSTM  85.2  0.447  0.248  4.413  2.536 
6) Proposed customized extreme learning machine.  96.4  0.623  0.181  4.232  2.321 
Model  AUC  F1 score  Log loss  Lift  EMPC 

1) RFM model  0.004  0.011  0.003  0.076  0.212 
2) Statistical model  0.005  0.016  0.004  0.088  0.218 
3) Survival analysis  0.003  0.013  0.004  0.079  0.217 
4) MTS model  0.002  0.012  0.004  0.049  0.232 
5) DL models i) RNN  0.003  0.013  0.003  0.057  0.294 
ii) GRU  0.002  0.012  0.002  0.069  0.287 
iii) LSTM  0.004  0.014  0.003  0.077  0.24 
6) Proposed customized extreme learning machine.  0.006  0.017  0.005  0.089  0.211 
Model  AUC  F1 score  Log loss  Lift  EMPC  Overall average rank in terms of computational speed 

1) RFM model  2.00  2.03  2.00  2.00  2.23  2.05 
2) Statistical based model  6.00  4.00  5.90  3.97  6.00  5.17 
3) Survival analysis  2.00  2.03  2.00  1.00  1.00  1.91 
4) MTS model  1.00  1.00  1.00  2.00  2.01  1.39 
5) DL models i) RNN  4.00  5.07  4.33  5.63  4.00  4.61 
ii) GRU  5.00  6.87  4.77  6.30  5.00  5.59 
iii) LSTM  2.00  2.03  2.00  2.00  2.02  2.01 
6) Proposed customized extreme learning machine.  1.00  1.00  1.00  1.00  1.00  1.00 
As shown in
First, a tStochastic Neighbour Embedding (tSNE) method, is used to visualize the raw illustration of the customers, which consisted of 20 K row vectors with 900 values each.
Studies that considered static churn modelling centered on forecasting churn on a monthly basis. Previous investigations, as stated in the literature section is aimed at churn as a static prediction issue. The research goal in this manuscript is to forecast churn on a daily basis based on dynamic variations in customer behavior. In this way, clients who are prone to churn can be identified in advance, and the maintenance cost of client management can be reduced accordingly. As a result, the research denoted the customer’s daily behavior using six different approaches and used this representation to formulate the problem of daily churn prediction. The findings demonstrate that the proposed model CELM, is more accurate than monthly techniques in operationally forecasting churners’ advances. This is critical from a business standpoint in order to improve the effectiveness of retention advertising campaigns. Furthermore, the research discovered that the frequency of the input is the first contributor to prediction accuracy, with daily behavior being preferable to monthly behavior. Along with the retention effect, there are a number of significant aspects that future research may take into account. The model lacks the causality necessary for trying to target and can be used to develop a straightforward daily churn prediction model, but more work is required for interpretability. Because the results are comprehensible, industries can interpret them. Based on their reasons for departing, target your churning customers. The model might be an LSTM with attentionbased learning. The second problem has to do with the inaccurate churn prediction caused by targeting. This suggests that it is insufficient to simply identify consumers at a high risk of leaving. Determine how to target customers for your business based on how they respond to retention tactics.
The authors received no specific funding for this study.
The authors declare that they have no conflicts of interest to report regarding the present study.