The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors. The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency. The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations. It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion. The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions. The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons. Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered. The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented. The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view. The practical problems of determining the threshold for the number of votes, based on which a classification decision is made, have been researched. Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy. Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy. The speed of the analysis increases in proportion to the degree of reduction. The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification, which guarantees a decent level of accuracy.

In image classification tasks, which are now quite relevant for computer vision, a feature system is often formed as a set of multidimensional vectors that fully reflect the spatial properties of a visual object for effective analysis [

For human vision, different parts of an image have different weights in the process of analysis or classification [

Factors that characterize the importance (weight, significance, significance) of features: the spread of values within the description and the database of etalon descriptions, the degree of resistance to geometric transformations and interference, the uniqueness and value of the presence in the image, etc. However, the specific level of the significance parameter, as well as the classification performance in general, is largely determined by the base of etalon images to be classified. The introduction of the significance parameter in the process of classification analysis allows us to move from the homogeneous influence of features to taking into account their relative weighting, which directly affects such classification characteristics as performance and efficiency [

If you first analyze the calculated weight values for the etalon components of the image database descriptions, you can construct class descriptions from the most valuable elements for classification, discarding the uninformative part of the description [

The work aims to reduce the amount of computational costs when implementing structural methods of image classification while maintaining their effectiveness. Reducing the description in the form of a set of keypoint descriptors is achieved by forming a subset of descriptors according to the criterion of classification significance.

The objectives of the research are as follows:

Studying the influence of the informativeness parameter for the set of descriptors of etalon descriptions on the effectiveness of image classification.

To reduce the etalon descriptions based on the value of informativeness.

Implementation of the informativeness parameter in the classifier model.

An experimental research of the effectiveness of the developed modifications of classifiers in terms of accuracy and processing speed based on simulation modeling for an applied image database.

Let us consider the set

The description

The finite set of binary vectors (descriptors) obtained by the keypoint detector creates a transformation-invariant description of the etalon or recognized image [

For each element

The matrices

It should be noted that the parameter

Let us set the task of developing a procedure

The second urgent task is to create a classifier

In the formulation under discussion, the classification of

It is worth noting that keypoint descriptors are a modern and effective tool for representing and analyzing descriptions of visual objects [

Improving the performance of structural classification methods based on comparing sets of vectors is being developed in such aspects as speeding up the process of finding component correspondences by clustering or hashing data [

The task of forming an effective subset of features, including by reduction, is constantly in the field of attention of computer vision researchers [

The data analysis literature studies some models for using feature significance to improve classification accuracy [

In measure

The weighting coefficients for keypoint descriptors have been used in several probabilistic classification models, where these coefficients are interpreted as the probability of classification [

Today, the applied performance of modern neural network systems [

Approaches based on the direct measurement of image features in the form of a set of descriptors have their advantages when implemented in computer vision systems [

It is clear that the keypoint descriptor apparatus does not have the ability to take into account the almost infinite variants of generative models for transformations in the formation of images by modern neural networks [

It is possible that these research areas (neural networks and structural methods) should be divided by application areas or applied tasks. For example, for neural networks, it is difficult to cope with the variability of objects in terms of geometric transformations. Also structural methods are sensitive to significant changes in the shape of objects (morphing), and when tracking moving objects, they require rewriting the etalon.

There is known research [

Thus, the conducted research indicates the need for a more detailed study of the process of implementing classification weighting indicators and evaluating their impact on the effectiveness of classification by description in the form of a set of descriptors both based on reducing the set of features and by direct use in classifiers in conjunction with metric relations.

An important factor that can affect the classification result can be considered the use of individual values of significance

To do this, let’s consider one of the practical schemes based on the initial classification of individual object descriptors with the subsequent accumulation of certain votes for their classes and values [

Considering that the order of components in the descriptions of the etalon and the recognized object

According to the traditional nearest-neighbor scheme, we will classify by calculating the minimum

The main metric used to evaluate the deviation of a pair of binary descriptors is the computationally efficient Hamming distance [

The model for implementing value largely depends on the chosen method of implementation

After performing

As a result of the first stage, the final number of

The final classification decision

Model

Let us define the possible practical conditions for refusing to define a class as

Condition

Along with the conditions

As shown by the results of our experimental modeling using specific models

By the traditional scheme

It is clear that in classification, the main result is the class number. Therefore, in practice, especially when the etalons are set, more productive approaches are often used, which are reduced to a stepwise search, where the first step is a search within the description for a fixed class [

It should be noted that the implementation of each of these modifications is associated with some peculiarities (the method of determining the relevance of descriptors, the choice of the threshold for the number of votes, significance, etc.) In addition, if the number of descriptors in a description is small or if the powers of the compared descriptions are unequal, there are additional difficulties with the “friend-or-foe” distinction [

As an option, let us consider a classification method that consists of accumulating optimal values for each of the classes without first setting the class number for a particular descriptor, as is done in the traditional method

We find the minimum distances

This method is more focused on consistency with etalon data, when each etalon “looks for its own” among the components of the object description. This method is effective when the power of the etalon and object descriptions is different. The maximum number of received votes for a class coincides with the number of descriptors in the etalon.

Another practical approach to establishing the degree of relevance of two descriptions is to search for a minimum with double checking (Cross-Checking [

In general, the data analysis scheme by models

In the considered models, the class of the analyzed descriptor, taking into account the significance parameter, is determined based on two criteria: the value

One of the most effective ways to improve classification performance is to compress the feature space [

Given the peculiarities of the voting method, where the number of comparable descriptions is considered to be approximately the same, it is natural to consider the dimensions of the descriptions transformed after compression to be equivalent [

Our modification of the nearest neighbor method using the Cross-Checking model makes it possible to realize this. As a result of the reduction of the etalon descriptions, it becomes necessary to modify the parameters of the classifier: the decisive number of votes, the ratio of global and local maxima, etc.

Our analysis has shown that direct selection of a fixed reduced number of keypoint descriptors by controlling the keypoint detector parameters does not lead to improved performance, as the classification accuracy decreases in proportion to the description reduction. It is more effective to initially generate a large number of description descriptors (500 or more) that reflect all image features, followed by a reduction based on the significance criterion.

It should be noted that if the number of descriptors for the etalons and the input image differs significantly, the probability of a degenerate situation when several etalons can be found in the input image increases somewhat. Thus, each reduction has its limits of application.

The classifier scheme using description reduction is shown in

Given that humans often form the recognition conditions in computer vision systems, the value of the matrix

Let us use the metric criterion of informativeness to calculate the matrix

When implementing normalized distances with a value of

The use of model

In [

Criterion

Thus, our proposals for improving structural classification methods [

1) In accordance with the classification scheme of

2) Based on the largest values of the indicator

3) We use the obtained subset of

The classification performance will be evaluated by the value of the accuracy index

Another important indicator for voting methods is the ratio of the maximum number of votes

The value

The main thresholds used in the paper are threshold

As for the equivalence threshold

Note that the value of the informativeness parameter

At the same time, the classification confidence index

For software modeling, the Python programming language, the OpenCV computer vision library, and the NumPy library were used to accelerate the processing of multidimensional data, and 500 256-bit Oriented FAST and Rotated BRIEF (ORB) descriptors were created to describe each image [

For testing, a set of input images from

For a given test set with several descriptors of

Despite the lower threshold of

Next, based on the calculation of indicator

Given the unequal number

It should be noted that the use of the

The value of

Based on our experiments in this and other research [

Based on the experiments, the threshold for the effective number of votes was statistically chosen as half of the number of descriptors (250 and 25, respectively). The threshold

Given that any progressive system strives for simplification, the

For the test reduced set at

The accuracy of

Parameters | Threshold |
Time |
Accuracy |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Number of descriptors | 500 | 50 | 25 | 10 | 500 | 50 | 25 | 10 | 500 | 50 | 25 | 10 |

Modification of the nearest neighbor | 250 | 25 | 15 | 8 | 0.28 | 0.027 | 0.014 | 0.0061 | 1 | 1 | 0.95 | 0.93 |

Cross-Checking | 220 | 25 | 15 | 6 | 0.28 | 0.026 | 0.013 | 0.0056 | 1 | 1 | 1 | 0.96 |

For comparative evaluation, in our separate experiment, we directly initially determined 50 and 25 (instead of 500) ORB descriptors for the object and the etalons. Such a direct reduction of the analyzed data and its use for classification led to a significant deterioration of the

Thus, only the method of description reduction based on determining the informativeness of the etalon descriptors maintained a high level of classification accuracy. At the same time, the classification performance of the compressed description directly depends on the procedure of its formation. The simultaneous provision of high accuracy and classification performance is achieved by the procedure of stepwise reduction of descriptions for the database etalons based on the evaluation of the informativeness criterion

It should be noted that in the process of selecting descriptors by informativeness, its actual value changes and needs to be recalculated for further use, since the value of informativeness

Another caveat concerns the direct use of information content values in models of the form

Therefore, based on the set of 50 selected descriptors for the etalons, we recalculated the values of informativeness for them. They formed the ranges −19...+59, −28...+49, −9...+49, −27...+57, −21...+60 with average values of 36, 31, 33, 30, 38. Thus, the total range of informativeness values for the modified database was −21...+60.

Based on the obtained informativeness indicators, new descriptions of the etalons with 25 descriptors each were formed by selecting the largest values.

The simulation showed that for the description database of 25 descriptors, only 2 false positive objects (assigned to a certain class) were identified from the number of images that are not included in the etalon images. At the same time, all images from the database were classified correctly! The accuracy is 0.95.

Similarly, out of the 50 descriptors selected in the first stage, the 10 most informative descriptors were identified. On the test set, 10 misclassified objects out of 51 were experimentally identified, and the accuracy was 0.81. The decrease in accuracy was more influenced by images not from the database, as the similarity of all images increases significantly for a small sample of features. If we exclude the condition for not exceeding the threshold

It should be noted that, according to the principles of data science [

Experiments with the accumulation of the parameter of descriptors’ informativeness as a variant of the model

To simplify the calculations, we divided the information content coefficients listed for the 50 descriptors into 4 intervals of approximately equal width, and assigned them interval weights of 1, 2, 3, 4, so that these weights could be accumulated along with the number of votes, as well as the product of the corresponding weight and the minimum distance obtained by the matching search. The analysis showed that the vast majority of informative etalon descriptors (39...43 out of 50 for different etalons) received interval coefficients of 1 and 2, i.e., have a significant level of information content. This can be explained by the fact that the analyzed data have already passed the selection process according to the criterion of informativeness. As a result of classification by the nearest neighbor modification, we have an example for a transformed image of the 2nd class in terms of accumulated votes [13, 43, 3, 4, 7], values [17, 70, 6, 6, 14], and products of values by distance [938, 2270, 332, 347, 866] with the parameter

As we can see from this example, the values obtained by the classifier for votes and importance are highly correlated. This can be explained by the fact that the applied informativeness criterion

To summarize the results of the research, we conducted a software modeling of the classifier for the same test data using the Cross-Checking model for double-checking the descriptors’ correspondence. For 500 descriptors, the accuracy of 1.0 has not changed, but to ensure it, it is necessary to reduce the threshold

The confidence factor

For a set of 50 descriptors, the maximum accuracy of 1.0 was achieved at the threshold of

An important result was obtained for a small number of descriptors of 10 elements, where the significance coefficient may have a greater impact. When choosing the threshold

At the same time, the classification by the accumulated significance of

When classifying 25 descriptors according to model

The analysis of the experiments and the content of

Cross-Checking provides a more accurate classification with a better value

The accumulated significance is correlated with the number of votes of the classes and can be used to confirm the decision. The classification performance by the accumulated significance almost coincides with the decision by the number of votes.

The method of calculating matches with Cross-Checking is more sensitive to the choice of threshold

The

The classification accuracy for the researched methods, especially when describing in 10 descriptors, can be improved by adaptive selection of

Implementation of the model

For small reduced volumes of description, it is advisable to apply a simple classification rule based only on the number of votes of the classes.

The introduction of significance in the form of the data informativeness criterion into the process of structural classification enhances adaptability with class etalons and ensures informed decision-making. The use of the informativeness parameter opens up new possibilities for managing the process of data analysis in the course of classification. The key to the classification process is the metric relationship between descriptors and etalons.

The significance parameter can be successfully used only in situations where its value has variability across a set of descriptors or etalons. Classification using reduced descriptions of the etalons, compared to the full description, remains effective only if the description is reduced by selecting according to the information content criterion. Directly generating a smaller description significantly worsens the classification accuracy rate.

The use of different ORB and BRISK descriptors in the experiment confirms the universality of the proposed mechanism for reducing description data regardless of the type of detector.

The main result of the research is the establishment of high performance and efficiency of classifiers based on the reduced composition of the description of the etalons. The use of reduction to transform the set of image description descriptors makes it possible to significantly speed up processing without significant loss of classification accuracy. The processing speed increases in proportion to the degree of data reduction and was improved by a factor of 20 in the experiment. It is practically advisable to use 10% of the most informative descriptors, which provides a 10-fold increase in performance while maintaining full accuracy. If speed is a key criterion, then it is permissible to use even 5% of the original number of descriptors, which provides an increase in speed by almost 20 times, but with a slight decrease in accuracy to 0.95.

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research (work through the Project Number PSAU/2023/01/25387). The authors acknowledge the support to the undergraduate students Oleksandr Kulyk and Viacheslav Klinov of the Department of Informatics, Kharkiv National University of Radio Electronics, for participating in the experiments and to the management of the Department of Informatics for supporting the research.

This research was funded by Prince Sattam bin Abdulaziz University (Project Number PSAU/2023/01/25387).

The authors confirm contribution to the paper as follows: research conception and design: Yousef Ibrahim Daradkeh, Volodymyr Gorokhovatskyi, Iryna Tvoroshenko, Medien Zeghid; methodology: Volodymyr Gorokhovatskyi, Iryna Tvoroshenko; data collection: Volodymyr Gorokhovatskyi, Iryna Tvoroshenko; analysis and interpretation of results: Volodymyr Gorokhovatskyi, Iryna Tvoroshenko; draft manuscript preparation: Iryna Tvoroshenko. All authors reviewed the results and approved the final version of the manuscript.

The authors confirm that the data supporting the findings of this study are available within the article.

Not applicable.

The authors declare that they have no conflicts of interest to report regarding the present research.