Responding to complex analytical queries in the data warehouse (DW) is one of the most challenging tasks that require prompt attention. The problem of materialized view (MV) selection relies on selecting the most optimal views that can respond to more queries simultaneously. This work introduces a combined approach in which the constraint handling process is combined with metaheuristics to select the most optimal subset of DW views from DWs. The proposed work initially refines the solution to enable a feasible selection of views using the ensemble constraint handling technique (ECHT). The constraints such as selfadaptive penalty, epsilon (
Generally, a view represents the set of query data. If the query data is occasionally updated in the base stable, it is known as a materialized view (MV). MVs are usually utilized frequently in environments where the data is accessed [
The major contributions of the proposed methodology are described as follows:
• An optimal MV selection procedure is introduced in this work with the combination of constraint handling and metaheuristic optimizationbased selection steps.
• An effective ensemble constraint handling technique (ECHT) is presented to refine the solutions and to enable the framework to select the most optimal set of views for materialization.
• A combined approach is introduced in this work to select optimal MVs with minimized cost functions. The proposed framework hybridizes the Ebola and coot optimization algorithms to achieve the desired performance.
• The proposed methodology considered different costbased fitness evaluations to reduce the query response time. The proposed combination of approaches provides an optimal view selection and lesser query response time.
The rest of the paper’s organization is summarized as follows;
To select the MVs optimally, Prakash et al. developed a multiobjective algorithmbased approach [
Moreover, it decreased the payment overhead of customers. The performance of the presented approach can be improved by utilizing an improved combination of methodologies. A stochastic ranking (SR) based cuckoo search (CS) optimization was introduced by Gosain et al. [
Many MV selection approaches have recently followed deep learning and machine learning strategies to attain performance benefits. The literature analyzes some popular and effective learning strategies that can be incorporated to achieve the MV selection task. An advanced version of the extreme learning machine (ELM) was introduced by Wang et al. [
Analyzing existing works is important for deliberating the necessity of MV selection. They focused on MV selection using different approaches. However, the existing approaches failed to get the query response in less time. Moreover, an effective MV selection approach is needed to improve the performance using MVs. Therefore, this work presented an optimal MV selection approach using an ensemble of constraint handling approaches and optimization approaches.
Parameters  Descriptions 

Fitness value of ECHT  
Distance measure  
Penalty value  
Epsilon constraint  
Counter  
Control generation counter  
Top 

Index number  
Upper boundary  
Lower boundary  
Time  
Best solution  
Global best solution  
Current best solution  
Current location of 

Current position of 

Number of leaders  
Index number of leader data  
Leader position  
Arbitrary numbers in the range [0,1]  
*  Multiplication symbol 
Finest position  
Parameter to determine iteration  
Current iteration  
Total iterations  
Query processing cost  
Frequency count of queries  
Accessing cost of queries  
Maintenance cost  
Updated frequency of queries  
Maintenance cost of view 

Query response cost  
Total response cost  
Fitness value of CHECO algorithm  
Execution time  
Initial time  
Ending time  
Total cost  
Consumed cost of each process 
The proposed model deals with the MV selection problem based on ECHT. The ensemble constraints are considered for optimizing the problem. The proposed model introduces a novel algorithm called the constrained hybrid Ebola with coot optimization (CHECO) algorithm for faster and optimal selection of queries from the DW. The proposed CHECO algorithm chooses the top views based on satisfaction of defined fitness. The schematic diagram of the proposed methodology is depicted in
The constraint handling approaches enhance the MV selection process and provide the optimal solution. Here, ensemble constraint handling is attained by integrating the following constraints.
In this step, two penalty types are united for each individual to identify nonviable individuals. The higher penalty value is considered for the infeasible views, and a lower penalty value is considered a feasible solution. Then, the threshold value is utilized for ranking the feasible and infeasible solutions, and thus optimal views are obtained. The ranking process of views is expressed in condition
Here,
In this processing approach, constraints are handled by the
Here,
The SR constraint handling scheme finds the finest feasible solution by balancing the penalty and objective function. It provides the ranks for each individual by comparing the two individuals according to the probability of individuals. If both provide optimal results, the individual with a lesser objective value is given the highest value. Moreover, if one has a nonviable result and the other has a viable outcome, the highest rank is given to the individual with a viable outcome.
This section presents CHECO, a combination of Ebola [
Arbitrary data search is performed on the arbitrarily initialized data in random directions to find the optimal data positions. At first, arbitrary populations are generated with the starting position being zero. The ranges of data with upper and lower boundaries are
The current best positions are identified and updated in time
Here,
The mean positions of two views are considered in the chain movement process to update the position, and it is expressed in the subsequent condition
Here,
The views are updating their position and moving towards the optimal position. Here, the movement is based on the mean position of leaders. According to the mean position of leaders, all the views are updating their position. The movement is based on the expressed condition
Here,
Here,
The positions of individuals are updated to reach the optimal location. The optimal position update is expressed in subsequent conditions
Here,
Here,
An optimal choice of views for materialization is necessary for a lesser processing time of queries. A DW is a huge data storehouse that supports query decisionmaking in an incorporated environment. A DW has many data records, and it is necessary to decrease the online query processing time. Furthermore, fitness is evaluated using condition
In this section, different processing cost measures are considered for computing the performance of the developed system. Multiple objective functions are developed for the CHECO algorithm, and these objective values are analyzed for each individual view in every iteration. This helps the algorithm in attaining the most optimal solution.
The processing cost is to access the views of queries that mention their execution frequency. It is expressed in subsequent conditions
Here,
It is the cost needed to restore the views whenever their respective base associations are restructured. The calculation of maintenance is expressed by the subsequent condition
Here,
It is the cost used to respond to queries. The reduced response cost enhances the performance of the system. It is calculated in subsequent conditions
Here,
Here,
The queries in the DW are selected using CHECO schemes for faster and optimal selection. Here, optimal MV selection is performed to respond to the queries in less time and processing cost.
This section evaluates the performance of the proposed MV selection using ensemble approaches. The performance of the presented methodology is examined with different existing schemes in terms of performance metrics like query processing cost, maintenance cost, total cost, execution time and maintenance cost. The existing approaches are Genetic algorithm based MV selection (GAMVS), PSObased MV selection (PSOMVS), AntColony optimizationbased MV selection (ACOMVS), Coral reefs optimizationbased MV selection (CROMVS) [
In this section, different performance metrics are deliberated to validate the performance of the proposed approach. The metrics are described in subsequent subsections.
The processing time is taken to execute the number of queries in MV selection. It is computed by the subsequent condition
Here,
It is the cost taken to process the queries in the proposed methodology. It is computed by expressed condition
It is the cost needed to restore the views whenever its respective base associations are restructured. The maintenance cost calculation is expressed by the subsequent condition
It is the total cost of all the processes in MV, like query processing, query response and query maintenance cost. It is expressed in subsequent conditions
Here,
This section analyses the performance of the presented methodology with different current approaches. The query process comparison of the proposed approach is analyzed in execution time is shown in
In
Methods  Maintenance cost  Query processing cost  Total cost 

ACOMVS  6,329,353,925,114  3,522,858,242,562  9,852,212,167,676 
GAMVS  6,329,354,098,494  3,522,858,302,421  9,852,212,400,915 
CROMVS  6,329,354,721,658  3,522,858,506,422  9,852,213,228,080 
PSOMVS  6,329,355,992,789  3,522,859,724,119  9,852,215,716,908 
In
In
Data size (GB)  Proposed (s)  ACOMVS (s)  PSOMVS (s)  GAMVS (s)  CROMVS (s) 

0.25  14  13  15  13  
0.50  85  70  90  81  
0.75  175  120  185  160  
1  250  165  265  225 
In
In
In this section, the statistical analysis of the proposed methodology is evaluated. At first, the ANOVA test examines the exact difference between attained results and techniques. The ANOVA test is examined by varying dataset sizes and several queries. Therefore, different results are examined by using the ANOVA test. The TPCH dataset is considered for the ANOVA test in the proposed work. The test analysis for some queries with the TPCH dataset is mentioned in
Methods  Count  Sum  Average  Variance 

EA  4  485  102  7687 
GAMVS  4  497  124.25  11, 148.25 
ACOMVS  4  470  117.5  10, 443.66 
PSOMVS  4  269  67.25  3334.25 
SRCSMAVS  4  365  87  6745 
ProRes  4  324  92  4567 
CROMVS  4  434  108.5  9209 
FSAMVS  4  250  60.10  2,748.2 
Methods  Count  Sum  Average  Variance 

EA  4  578  98  9845 
GAMVS  4  568  142  11, 610 
ACOMVS  4  540  135  10, 574.6 
PSOMVS  4  352  88  3236 
SRCSMAVS  4  435  92  5679 
ProRes  4  410  78  8764 
CROMVS  4  509  127.25  9215.583 
FSAMVS  4  325  80  2764 
Methods  Overall cost  Execution time 

EA  Weak  Applicable 
GAMVS  Applicable  Weak 
ACOMVS  Applicable  Weak 
PSOMVS  Weak  Applicable 
SRCSMAVS  Moderate  Weak 
ProRes  Applicable  Moderate 
CROMVS  Moderate  Applicable 
FSAMVS  Applicable  Applicable 
Algorithms  Best  Mean  Median  Weighted  Rank 

SASS  0.0949  0.1046  0.0981  0.0984  2 
sCMAgES  0.2629  0.1713  0.1791  0.2186  3 
EnMODE  1.89  1.82  1.89  –  4 
The overall analysis states that the proposed framework is more suitable and effective for solving the MV selection problem than the recent techniques. The proposed framework established a combined approach to choose the optimal MVs with reduced cost values. Moreover, the constraint handling scheme helped the framework to achieve the desired performance by finegraining the solutions. The analysis of the proposed framework regarding cost functions proved that the model is more effective in choosing the MVs that can minimize the cost abruptly than the other stateoftheart techniques. The time taken by the framework is slightly larger than the compared techniques for larger dataset size. This is because of the increased number of computations involved in the execution of the algorithm. Varying the dataset’s size impacts the framework’s overall computational efficiency. The significance of the framework is also proved through the statistical analysis conducted in the performance evaluation part. From the results obtained, it is clear that the methodology is statistically significant in selecting the optimal views than the compared techniques. The framework’s main advantage is that it can select the most optimal set of views despite the size of the dataset or the number of queries involved. Also, the framework can select costeffective views more effectively than the recently introduced techniques.
Overall, the proposed framework leads to an effective contribution in selecting the most optimal MVs with minimized processing and maintenance costs to the customers. One of the main problems identified is with the overall computational efficiency of the framework. When the size of the dataset increases, the time is taken by the algorithm for execution increases resulting in computational complexity. Therefore, future works can be built to reduce the required computations, even combined algorithms.
A comparison with the previous works using the same dataset has been made, and the results are presented in
Methods  Maintenance cost  Query processing cost  Total cost  Execution time 

Sohrabi et al. [ 
–  –  –  ~130 s 
Kharat et al. [ 
–  –  –  233.99 
Sohrabi et al. [ 
6,329,368,000,604  3,522,873,256,157  9,852,241,256,761  – 
Azgomi et al. [ 
6,329,368,000,604  3,522,873,256,157  9,852,241,256,761  
226 s 
This paper presented an optimal selection of MVs using an effective combination of ensemble approaches. At first, an ensemble combination of constrainthandling approaches is presented for an optimal selection of queries. Here, constraints like SR, epsilon, and selfadaptive penalty are considered for optimal selection views. Afterwards, hybrid Ebola and coot optimization is utilized for faster and optimal query selection in views. Here, fitness parameters like maintenance cost, query processing and response cost are considered to improve the performance. The performance of the developed MV selection is validated with different current approaches in terms of performance metrics like query processing cost, maintenance cost, total cost, execution time and maintenance cost. The overall analysis suggested that the performance of the proposed approach is more optimal and effective than the other approaches. The proposed approach also resulted in a query processing cost of 3,522,857,483,566 and a maintenance cost of 6,329,354,613,784, which is much more effective than the other algorithms. In the future, the MV selection can be improved using further enhanced processes, and it can be analyzed with many benchmark datasets.
The authors received no specific funding for this study.
Popuri Srinivasa Rao: Conceptualization, Methodology, Analyzing, Software Improvements, Writing—Original Draft Preparation, Supervision, Investigation, Resources, Data Curation. Aravapalli Rama Satish: Conceptualization, Methodology, Software, Validation, Formal Analysis, Review & Editing, Visualization, Supervision, Project Administration.
TPCH data warehouse. Available at
The authors declare that they have no conflicts of interest to report regarding the present study.