An invariant can be described as an essential relationship between program variables. The invariants are very useful in software checking and verification. The tools that are used to detect invariants are invariant detectors. There are two types of invariant detectors: dynamic invariant detectors and static invariant detectors. Daikon software is an available computer program that implements a special case of a dynamic invariant detection algorithm. Daikon proposes a dynamic invariant detection algorithm based on several runs of the tested program; then, it gathers the values of its variables, and finally, it detects relationships between the variables based on a simple statistical analysis. This method has some drawbacks. One of its biggest drawbacks is its overwhelming time order. It is observed that the runtime for the Daikon invariant detection tool is dependent on the ordering of traces in the trace file. A mechanism is proposed in order to reduce differences in adjacent trace files. It is done by applying some special techniques of mutation/crossover in genetic algorithm (GA). An experiment is run to assess the benefits of this approach. Experimental findings reveal that the runtime of the proposed dynamic invariant detection algorithm is superior to the main approach with respect to these improvements.

Program invariants are rules or equations among program variables staying constant and unaffected for the successive program runs with various input parameters. Invariants could have been used to develop real and reliable programs. For example, in a search program to find an element in an array of integer values, the array elements must be unchanged, and counter of the array that is to move through array elements must be at most equal to the length of the array after the function returns. Invariants have a significant impact on software engineering, especially automatic software engineering [

Invariants are often useful relative to two different programs (even with different algorithms) and can even allow programmers to test and verify their programs. For example, when an individual writes a program to organize a set of data, he/she will decide whether his/her program is accurate or it has bugs by comparison of his/her program invariants with those of an actual and consistent form of the program; for example, consider a

Ruthruff et al. [

As invariants can be used to help programmers improve maintainability, readability, verification, documentation, and many other aspects of their programs, therefore, the software engineering researchers recently have interested in this field. Also, programmers are encouraged to extract programs’ invariants and then test their programs regarding the detected invariants.

There are two general methods to detect invariants: (1) Static invariant detection approach and (2) Dynamic invariant detection approach. Each approach has its weaknesses and strengths. The output of dynamic methods often is not reliable; therefore, we say that methods of dynamic invariant detection approach are unreliable. Static methods are also too difficult to be developed. Both of them are also considered to be time-consuming approaches.

Artificial intelligence is important and useful in many applications [

The present article first discusses related work in Section 2. Then, the problem and its motivation are briefly defined. We then describe the dynamic invariant detection approach in Section 3. Later on Section 3, we also address the drawbacks of dynamic approaches. Section 4 will elaborate on two ideas on optimizing Daikon runtime and improving its speed. Section 5 provides a comparison between the results of the latest innovations with the previous version of Daikon, according to a couple of C language programs. Section 6 experimentally shows that the proposed ideas are useful in real code, and the proposed ideas can decrease the runtime of the dynamic invariant detection approach. Ultimately, the conclusion and future work will be discussed in Section 7.

It seems to be two general methods for detecting invariants, as mentioned: static and dynamic. In the static methods, the analysis of program source codes through the use of compiler techniques is taken into account (e.g., extraction of data flow graphs in the program source code). In contrast, dynamic methods gather information from program implementation; profiling and testing are instances. This implies that dynamic methods utilize real values determined during the implementation of programs and define statistical relationships between variables dependent on their values [

Several methods, including ESC and Key for java and LClint for C, are available to extract invariant in a static way [

Some of the greatest disadvantages in dynamic methods are as follows: (1) They are unreliable and (2) They are costly, and above all, (3) They do not provide extremely accurate responses. Daikon which implements an algorithm considered to be one of the dynamic invariant detection approaches is the most suitable software until now [

As another advantage of Daikon, we can mention that it is open-source software, and everybody can modify and improve it. Nevertheless, this tool has several disadvantages. Several investigations to improve Daikon efficiency have been performed. Many modifications of Daikon have been presented up until now. The newest Daikon version, for example, contained a number of different approaches for equivalent variables, constant dynamic variables, elimination of weaker invariants, and variable hierarchy [

In order to create an approximate temporal model of actions, Silvaa et al. [

Mili et al. [

Costa et al. [

In this paper, the author proposes two techniques to improve the runtime efficiency of the extraction of program invariants (preconditions, postconditions, and loop invariants) during dynamic invariant detection (i.e., recording values of variables during program executions at the selected observation points and the discovery of statistical relations between the variables). The first technique is the reduction of the quantity of program variables that should be evaluated. The second technique is the sorting of the recorded data traces based on the number of variables in which values in the various runs have not been modified.

Program invariants are equations or rules between program variables that stay unchanged and constant with regard to successive program runs with various input parameters. Invariants could be applied to create real and accurate programs. Invariants are detected in two general ways: static and dynamic. The main challenge in static detection is the challenges a programmer will encounter. Dynamic approaches, on the other hand, obtain information from program executions. In dynamic approaches, two of its biggest drawbacks are that they are unreliable and time-consuming. Daikon, which implements a dynamic invariant detection algorithm, is the most suitable software until now.

In this paper, our contribution is to offer two ideas to improve the runtime of the extraction of invariants based on a dynamic invariant approach. To reduce its runtime, initially, the effective factors must be calculated on the runtime in dynamic methods. Then, it will be tried to decrease the runtime of the algorithm based on these factors. Based on previous researches, the main factor involved in its runtime order is the number of the variables that are in the scope of the tested program. Therefore, one idea is to reduce the number of variables that must be checked.

As seen in

Unary invariants are invariants that are identified in a single variable. For example,

Daikon also extracts the latent or derived variables and treats them just like other variables. Daikon finds three types of invariants in every program point. The program points include precondition, postcondition, and loop invariants. The precondition invariants are rules between variables before entering to program point. The postcondition invariants are rules between variables after exiting of program point, and loop invariants refer to rules between variables through every iteration loop in the program point.

As previously demonstrated, Daikon implements the dynamic invariant approach. First of all, the Daikon instruments program; it means a series of codes must be inserted into the original code at the all of the program points to save the values of variables. Daikon has an option to use one of two special tools to obtain this objective: Chicory or Kvasir.

They are two different sub-tools in Daikon. Both are open-source and have been developed with Java. Kvasir could be implemented only in Linux and generates an “Enter Procedure” and an “Exit Procedure” for every procedure in the source code. It writes the values of variables on “Enter Procedure” when program the program control enters into the procedure (precondition). It also writes the values of variables on “Exit Procedure” when the program control exits the procedure (postcondition) according to [

As the user can see from

The algorithms that are to detect the dynamic invariants have several drawbacks. One of their major challenges is their long run duration. Daikon runtime is on the basis of the number of variables inside the assessed domain, the size of programs, the number of running programs, and the number of templates for which the variables are evaluated is stated by

Invariants involve a maximum of three variables; thus, a cubic number of possible invariants is available. Only the invariants including one, two and three variables were found by Daikon. In the next section, two ideas will be proposed that, if applied, would substantially reduce the runtime needed for dynamic invariant detection [

As you can find out in previous sections, the runtime of the algorithm is linear in terms of the number of the variables that are to be checked for invariant detection. The present article attempts to offer two new ideas to reduce the runtime of the algorithm by decreasing the number of the variables that are to be checked for invariant detection. We expect that the following improvements to the Daikon source code would boost its performance considerably. Note that although the Daikon source code has been changed, the output invariants were also precisely the same as they were. Then, every idea will be evaluated to show its effect. Section 4.1 shows the properties of the variables which do not need to be tested. In addition, the second concept follows the first. The second idea is to sort data-trace files so that the algorithm time order could be decreased. This is evaluated in Section 4.2.

In this section, it is tried to introduce a property that if a variable has it, it does not need to be checked. While runtime will effectively be reduced by eliminating these variables, these variables have not any effect on final invariants. Consider that the Daikon algorithm runs multiple input-parameter programs and then derives and tracks the variable values. There are many extracted variables and function parameters and other variables that, in subsequent runs, do not alter values, but Daikon tests them in subsequent runs. Furthermore, it is not crucial to measure unmodified variables values during subsequent runs.

For example, assume that there are three variables called

Section 5 will discuss the results of this idea. Findings demonstrate that these modifications significantly reduced the runtime. With regards to findings, as predicted, the runtime is in the best possible state in the case that there are a few number of variations in the values of variables in sequential runs; and in the worst case where there is no similarity in the values of variables in consequential program runs. This is why data-trace files have to be classified first of all depending on minor variations in the values of variables, in the second idea addressed in depth in Section 4.2.

Within this subsection, a method of sorting data-trace files is attempted to be found according to minimal variations in the quantity of the variables, the values of which are altered consecutively. (a) It is costly and very time-consuming to sort data-trace files using the deterministic techniques because of factorial order. (b) Finding the best ordering of the data-trace files to minimize modifications in values of variables of consecutive runs is an NP problem. (c) On the other hand, it is not necessary to achieve the best combination of data-trace files; it means a combination that is close enough to the best is admissible. Therefore, it is concluded from (a), (b) and (c) that the use of a non-deterministic heuristic approach, such as a genetic algorithm, is an excellent choice [

The chromosome representation model is the first section to be discussed in designing a genetic algorithm. Chromosomes are employed in this problem, and their length is equal to the quantity of data-trace files and their gens occupied by a unique integer number from one and the quantity of data-trace files; therefore, the value of each gen differs from others implying that each chromosome is a permutation from one to the quantity of data-trace files. As an example, in six test cases in

Data-trace file | Yr | d1 | d2 | Return |
---|---|---|---|---|

1.dtrace | 4 | 4 | 5 | 10 |

2.dtarce | 4 | 4 | 5 | 10 |

3.dtrace | 2 | 2 | 3 | 5 |

4.dtrace | 5 | 5 | 6 | 11 |

5.dtrace | 5 | 5 | 6 | 11 |

6.dtrace | 5 | 5 | 6 | 11 |

Data-trace file | Yr | d1 | d2 | Return |
---|---|---|---|---|

5.dtrace | 5 | 5 | 6 | 11 |

6.dtrace | 5 | 5 | 6 | 11 |

1.dtrace | 4 | 4 | 5 | 10 |

2.dtarce | 4 | 4 | 5 | 10 |

4.dtrace | 5 | 5 | 6 | 11 |

3.dtrace | 2 | 2 | 3 | 5 |

Cross over operator is the next part that will be debated in designing a genetic algorithm. The cycle cross over approach is selected in this paper. Cycle crossover creates a child in which each gene has a correspondent from one parent cycle. See

The possibility of mutation is adjusted to be an actual positive value of approximately 0; 0.001 is chosen here. It is also used for every chromosome. This operator exchanges the values of two random gens.

Truncation selection is considered to be our genetic algorithm selection operator. Chromosomes such as children and their parents are, first of all, classified based on their fitness function values in this selection. Then, the greatest of all are chosen as a new genetic algorithm population.

The data track files are sorted on the basis of lowest discrepancies in the quantity of variables, the values of which change in successive runs. The fitness function is described as discrepancies in the quantity of variables, the values of which are altered. As it can be seen in

Two C-language programs are applied to conduct the experimentations. In the first program, Kvasir is run, the source of which is shown for six times in

The Daikon’s source code is then changed to avoid checking unmodified variables in the resulting data-trace files. The term “Trace” is often changed such that the data is written, while each variable is tested. On the basis of evidence, the term “Trace” appears sixteen times in the case that the unrevised version of Daikon is implemented on the data in

As shown in

For a further illustration, the source code seen in

The subsequent calculations are based on six Kvasir runs for variables in

Data-trace File | Yr | d1 | d2 | Return |
---|---|---|---|---|

1.dtrace | 1 | 1 | 2 | 4 |

2.dtarce | 3 | 3 | 4 | 8 |

3.dtrace | 4 | 4 | 5 | 11 |

4.dtrace | 2 | 2 | 3 | 6 |

5.dtrace | 4 | 4 | 5 | 11 |

6.dtrace | 2 | 2 | 3 | 6 |

After the original Daikon has been implemented for the variables’ values addressed in

Data-trace file | Yr | d1 | d2 | Return |
---|---|---|---|---|

1.dtrace | 1 | 1 | 2 | 4 |

2.dtarce | 3 | 3 | 4 | 8 |

6.dtrace | 2 | 2 | 3 | 6 |

4.dtrace | 2 | 2 | 3 | 6 |

3.dtrace | 4 | 4 | 5 | 11 |

5.dtrace | 4 | 4 | 5 | 11 |

For experimental justification, the algorithm is run on three versions of a function with 5, 10, and 20 variables. The used function for experimentations is merge sort.

Run-time of original Daikon in terms of millisecond is depicted in

Runtime improvements of the proposed algorithm (including

It is also worthy of mentioning that

The greatest challenge in detection of a program invariants is the huge cost needed to prepare a detector software. Therefore, dynamic invariant detection methods have emerged instead of dynamic invariant detection ones. But, the runtime of dynamic invariant detection methods is also the most important and considerable issue. Since invariants play a significant role in software testing, the reduction of its runtime would certainly contribute to the field of software engineering. An open-source computer software, i.e., Daikon, is used as our dynamic invariant detection algorithm. Daikon proposes a dynamic invariant detection algorithm based on several runs of the tested program; then, it gathers the values of its variables, and finally, it detects relationships between the variables based on a simple statistical analysis. It is observed that the runtime for the Daikon invariant detection tool is dependent on the ordering of traces in the trace file. A genetic algorithm is proposed to reorder traces in the trace file in order to reduce differences in adjacent trace files. It is concluded when the data-trace files or the number of input variables are bigger, it is more efficient to use the proposed algorithm. As in real-world software codes, the variables may be potentially very frequent, the improvement achieved by this paper will be more important.

This paper has been initially submitted on “August 5” by Hamid Parvin. As the paper has new author in revision, the EiC proposed authors to decline the paper at “October 6” and resubmit it. On “October 10,” they resubmitted it and it was finally accepted at “16 November.”