The BlazePose, which models human body skeletons as spatiotemporal graphs, has achieved fantastic performance in skeleton-based action identification. Skeleton extraction from photos for mobile devices has been made possible by the BlazePose system. A Spatial-Temporal Graph Convolutional Network (STGCN) can then forecast the actions. The Spatial-Temporal Graph Convolutional Network (STGCN) can be improved by simply replacing the skeleton input data with a different set of joints that provide more information about the activity of interest. On the other hand, existing approaches require the user to manually set the graph’s topology and then fix it across all input layers and samples. This research shows how to use the Statistical Fractal Search (SFS)-Guided whale optimization algorithm (GWOA). To get the best solution for the GWOA, we adopt the SFS diffusion algorithm, which uses the random walk with a Gaussian distribution method common to growing systems. Continuous values are transformed into binary to apply to the feature-selection problem in conjunction with the BlazePose skeletal topology and stochastic fractal search to construct a novel implementation of the BlazePose topology for action recognition. In our experiments, we employed the Kinetics and the NTU-RGB+D datasets. The achieved actiona accuracy in the X-View is 93.14% and in the X-Sub is 96.74%. In addition, the proposed model performs better in numerous statistical tests such as the Analysis of Variance (ANOVA), Wilcoxon signed-rank test, histogram, and times analysis.

BlazePose is an architecture for human posture prediction using a lightweight convolutional neural network optimized for real-time inference on mobile devices. During the inference process, the neural network generates 33 crucial body points for a single individual and runs at a rate of over30 frames per second on a Pixel 2 phone [

In contrast to methods based on heatmaps, regression-based approaches, despite being less computationally intensive and more scalable, attempt to forecast the mean coordinate values. Still, they frequently cannot resolve the ambiguity at the root of the problem. It has been demonstrated in [

We divided the review into two categories, the first for action recognition and the second for feature selection.

Primitive methods for recognizing actions performed by a skeleton typically used artificially-created features and took advantage of relative 3D joint rotations and translations. Deep learning introduced new algorithms that can improve robustness and achieve previously unattainable levels of performance, ushering in a new era of innovation in activity recognition [

Graph Neural Networks, in deep learning, the term “geometric deep learning” refers to all developing techniques that generalize deep learning models to non-Euclidean domains like graphs. The concept of a Graph Neural Network (GNN), was first described in [

Computer vision transformers First proposed in [

Determining the best combination of characteristics is difficult and time-consuming to compute. Recently, metaheuristics have been helpful and dependable methods for tackling various optimization issues [

In addition, the PSO algorithm was hybridized with the Bacterial Foraging Optimization method to improve the power system’s stability [

The authors’ goal in [

Another somewhat similar method can be found in reference [

The BlazePose system gives more information about joints than its predecessors and makes tracking more accurate. We think that the BlazePose system’s increase in the number of joints in the skeleton compared to other skeleton topologies (like OpenPose) will give us more information to help improve the ST-GCN model’s performance. The first difference between the two systems is how they figure out the pose from an image. OpenPose works from the bottom up, while BlazePose works from the top down. The first method identifies the body parts in the image and then maps them to the right person. The second method determines the person’s location and then estimates the main joints. We propose a novel skeleton topology that can help improve the performance of the ST-GCN model even more. The goal of the Enhanced-BlazePose topology is to make an even more accurate representation of the actions by adding more edges to the existing BlazePose topology. By adding feature selection layers with SFS-Guided WOA, we hope to understand how the shoulders and head move together during the activities. For the Kinetics dataset (D1) and the NTU-RGB+D dataset (D2).

Inspired optimizers have recently been introduced for feature selection optimization, and these optimizers have been tested for their efficacy and capacity to move optimization problems from local to global optimization. The wrapper-based approach and the filter-based approach, two standard methods for assessing feature quality, are utilized. Using a population-based heuristic random intelligence algorithm, the “whale optimization algorithm” introduces a new type of algorithm. The algorithm’s local search ability is improved through a shrinking encircling mechanism and a spiral ascending mechanism, both of which are inspired by the predation behavior of humpback whales. In contrast, its global search ability is improved through a random learning method. It benefits from having few control parameters, easy calculation, and robust optimal solution searching capability. The Guided WOA is an improved version of the original WOA. Like the global search, the early WOA would cause whales to swim aimlessly in circles. However, this method isn’t without flaws; for instance, a more sophisticated system can replace a random whale's search strategy and direct the whales to the best possible solution or prey much more quickly. To boost exploration efficiency, the Guided-WOA algorithm lets a whale follow not just one but three random whales. This can encourage whales to expand their range without compromising their authority. The SFS algorithm uses a diffusion method that generates random walks to find the best solution. The answer can serve as the basis for these. This improves the exploratory power of the Guided WOA, and the diffusion process is employed to arrive at the best option. Diffusion around the most up-to-date position includes Gaussian random walks as shown in

We resized each movie till it had the proportions 340 × 256 pixels. This did not consider any video frames in which the BlazePose system did not identify a skeleton as being present. Limiting the number of frames in the series of skeletons to just 300. As a result of this limitation, the majority of videos featured a limited number of frames. Because of this, if a sequence contained fewer than 300 frames, we repeated the first few until we reached the appropriate length. On the other hand, if the sequence contained more than 300 frames, we arbitrarily removed some excess frames. Using spatial configuration partitioning for joint label mapping, the model is trained for 80 epochs.

Each solution to the hybrid SFS Guided WOA is evaluated based on how well it meets a fitness function. The fitness function is proportional to the percentage of incorrect classifications and chosen features. Solutions are deemed effective if they reduce the number of features selected while maintaining or improving classification accuracy. The following equation is used to determine how effective each solution is:

The results are shown in two stages: First, the performance achieved in SFS-Guided WOA with all feature selection performance metrics. The second SFS-Guided WOA, in conjunction with the BlazePose skeletal topology and stochastic fractal search, constructs a novel implementation of the BlazePose topology for action recognition.

Classification average error shows how accurate the classifier is given the selected feature set. The classification average error can be calculated in

The Best Fitness function is the smallest fitness value for a particular optimizer throughout all

To execute an optimization process

The average Fitness size is a measure of the typical proportion of selected features to all available features. The equation for this metric is

Given a dataset of

Standard deviation represents the variation of the best solutions found for running a stochastic optimizer for M different runs. Std is used as an indicator for optimizer stability and robustness. In contrast, Std is smaller, which means that the optimizer always converges to the same solution, while larger values for Std mean many random results. Std is formulated as in

Average error | |||||||
---|---|---|---|---|---|---|---|

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.28648 | 0.2768 | 0.285465 | 0.27374 | 0.2806 | 0.27374 | |

D2 | 0.22314 | 0.25091 | 0.230256 | 0.23596 | 0.24151 | 0.23083 | |

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.4807 | 0.6107 | 0.4356 | 0.7057 | 0.6157 | 0.5757 | |

D2 | 0.34237 | 0.56661 | 0.3897 | 0.40752 | 0.56358 | 0.46661 | |

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.33638 | 0.32279 | 0.3367 | 0.3237 | 0.3305 | 0.32376 | |

D2 | 0.25712 | 0.28462 | 0.23 | 0.2698 | 0.2753 | 0.26473 | |

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.22476 | 0.22476 | 0.28599 | 0.2635 | 0.2635 | 0.22476 | |

D2 | 0.18731 | 0.22115 | 0.17711 | 0.1957 | 0.1703 | 0.20423 | |

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.41888 | 0.43829 | 0.36064 | 0.4382 | 0.4577 | 0.39946 | |

D2 | 0.33115 | 0.33962 | 0.36332 | 0.3565 | 0.3734 | 0.38192 | |

Dataset | SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA |

D1 | 0.12759 | 0.13527 | 0.12759 | 0.1219 | 0.1262 | 0.12604 | |

D2 | 0.11822 | 0.10904 | 0.15034 | 0.1180 | 0.1236 | 0.12324 |

On the other hand, the time consumed in feature selection using the proposed approach and other approaches is presented in

Method | Time in seconds for D1 | Time in seconds for D2 | Average time |
---|---|---|---|

SFS-GWOA | 31.194 | 33.612 | 32.403 |

bGWO | 33.838 | 35.543 | 34.6905 |

bPSO | 33.52 | 35.115 | 34.3175 |

bSFS | 34.92 | 34.87 | 34.895 |

bWAO | 33.327 | 34.448 | 33.8875 |

bFA | 34.548 | 35.132 | 34.84 |

bGA | 33.794 | 35.068 | 34.431 |

The statistical analysis of the results is presented in

SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA | |
---|---|---|---|---|---|---|---|

Number of values | 14 | 14 | 14 | 14 | 14 | 14 | 14 |

Minimum | 0.2708 | 0.2855 | 0.2768 | 0.2755 | 0.2737 | 0.2806 | 0.2737 |

25% Percentile | 0.2728 | 0.2865 | 0.2768 | 0.2855 | 0.2737 | 0.2806 | 0.2737 |

Median | 0.2728 | 0.2865 | 0.2768 | 0.2855 | 0.2737 | 0.2806 | 0.2737 |

75% Percentile | 0.2728 | 0.2865 | 0.2768 | 0.2855 | 0.2737 | 0.2806 | 0.2737 |

Maximum | 0.2728 | 0.2965 | 0.2968 | 0.2885 | 0.2937 | 0.2906 | 0.2797 |

Range | 0.002 | 0.011 | 0.02 | 0.013 | 0.02 | 0.01 | 0.006 |

Mean | 0.2725 | 0.2873 | 0.2789 | 0.285 | 0.2759 | 0.2818 | 0.2745 |

Std. Deviation | 0.000578 | 0.0027 | 0.00578 | 0.00284 | 0.00578 | 0.00314 | 0.00200 |

Std. Error of mean | 0.000154 | 0.0007 | 0.00154 | 0.00076 | 0.00154 | 0.00084 | 0.00053 |

Sum | 3.816 | 4.023 | 3.905 | 3.99 | 3.862 | 3.945 | 3.843 |

SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA | |
---|---|---|---|---|---|---|---|

Number of values | 14 | 14 | 14 | 14 | 14 | 14 | 14 |

Minimum | 0.2101 | 0.2201 | 0.2409 | 0.2203 | 0.226 | 0.2315 | 0.2208 |

25% Percentile | 0.2101 | 0.2231 | 0.2509 | 0.2303 | 0.236 | 0.2415 | 0.2308 |

Median | 0.2101 | 0.2231 | 0.2509 | 0.2303 | 0.236 | 0.2415 | 0.2308 |

75% Percentile | 0.2101 | 0.2231 | 0.2509 | 0.2303 | 0.236 | 0.2415 | 0.2308 |

Maximum | 0.2121 | 0.2331 | 0.2609 | 0.2403 | 0.246 | 0.2615 | 0.2408 |

Range | 0.002 | 0.013 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 |

Mean | 0.2104 | 0.2236 | 0.2509 | 0.2303 | 0.236 | 0.2429 | 0.2303 |

Std. Deviation | 0.000578 | 0.00284 | 0.00392 | 0.00392 | 0.00392 | 0.00663 | 0.00484 |

Std. Error of mean | 0.000154 | 0.00076 | 0.00104 | 0.00104 | 0.00104 | 0.00177 | 0.00129 |

Sum | 2.945 | 3.131 | 3.513 | 3.224 | 3.303 | 3.401 | 3.225 |

ANOVA table | SS | DF | MS | F (DFn, DFd) | |
---|---|---|---|---|---|

Treatment (between columns) | 0.00256 | 6 | 0.00042 | F (6, 91) = 30.82 | |

Residual (within columns) | 0.00126 | 91 | 1.39E-05 | ||

Total | 0.00382 | 97 |

ANOVA table | SS | DF | MS | F (DFn, DFd) | |
---|---|---|---|---|---|

Treatment (between columns) | 0.01452 | 6 | 0.00242 | F (6, 91) = 138.8 | |

Residual (within columns) | 0.00158 | 91 | 0.0000174 | ||

Total | 0.01611 | 97 |

SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA | |
---|---|---|---|---|---|---|---|

Theoretical median | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Actual median | 0.2728 | 0.286 | 0.276 | 0.285 | 0.273 | 0.280 | 0.273 |

Number of values | 14 | 14 | 14 | 14 | 14 | 14 | 14 |

Wilcoxon Signed Rank Test | |||||||

Sum of signed ranks (W) | 105 | 105 | 105 | 105 | 105 | 105 | 105 |

Sum of positive ranks | 105 | 105 | 105 | 105 | 105 | 105 | 105 |

Sum of negative ranks | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | |

Exact or estimate? | Exact | Exact | Exact | Exact | Exact | Exact | Exact |

*** | *** | *** | *** | *** | *** | *** | |

Significant (alpha=0.05)? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |

How big is the discrepancy? | |||||||

Discrepancy | 0.2728 | 0.2865 | 0.276 | 0.285 | 0.273 | 0.280 | 0.273 |

SFS-GWOA | bGWO | bPSO | bSFS | bWAO | bFA | bGA | |
---|---|---|---|---|---|---|---|

Theoretical median | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Actual median | 0.2101 | 0.2231 | 0.250 | 0.230 | 0.236 | 0.241 | 0.230 |

Number of values | 14 | 14 | 14 | 14 | 14 | 14 | 14 |

Wilcoxon signed rank test | |||||||

Sum of signed ranks (W) | 105 | 105 | 105 | 105 | 105 | 105 | 105 |

Sum of positive ranks | 105 | 105 | 105 | 105 | 105 | 105 | 105 |

Sum of negative ranks | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | |

Exact or estimate? | Exact | Exact | Exact | Exact | Exact | Exact | Exact |

*** | *** | *** | *** | *** | *** | *** | |

Significant (alpha = 0.05)? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |

How big is the discrepancy? | |||||||

Discrepancy | 0.2101 | 0.2231 | 0.2509 | 0.2303 | 0.236 | 0.2415 | 0.2308 |

Another way to demonstrate the effectiveness of the proposed approach is by visualizing the achieved results.

In addition, the plots shown in

The histogram of the achieved results is shown in

The action recognition results achieved by the proposed approach are compared to the previous ST-GCN, ST-GDN, and BlazePose methods. The comparison results are presented in

Method | Top-1 | Top-5 |
---|---|---|

ST-GCN | 30.70% | 52.80% |

ST-GDN | 37.30% | 60.65% |

BlazePose, 50% st | 36.78% | 61.69% |

BlazePose, 80% st | 37.38% | 65.20% |

SFS-Guided WOA : BlazePose, 50% st | 51.79% | 77.13% |

SFS-Guided WOA : BlazePose, 80% st | 56.87% | 81.44% |

The proposed approach is evaluated using the Cross-Subject (X-Sub) and Cross-View (X-View) criteria suggested by the dataset’s developers.

Method | X-view | X-sub |
---|---|---|

ST-GCN | 81.50% | 88.30% |

ST-GDN | 89.70% | 95.90% |

BlazePose, 50% st | 87.30% | 90.34% |

BlazePose, 80% st | 87.62% | 91.75% |

SFS-Guided WOA : BlazePose, 50% st | 91.33% | 94.56% |

SFS-Guided WOA : BlazePose, 80% st | 93.14% | 96.74% |

This study introduces a new action-recognition method by building the BlazePose skeleton topology on top of the ST-GCN architecture and selecting features with SFS-Guided WOA. We have chosen the Kinetics and NTU-RGB+D benchmark datasets to give a reliable basis for comparison with the baseline model in. When the visual data has been acquired in unconstrained contexts, we advocated using alternative skeletal detection criteria to increase the model’s performance. We conclude with research that contrasts the suggested strategy with BlazePose, ST-GCN, and ST-GDN. We have demonstrated that BlazePose’s topology may be improved by selecting the appropriate features for feet and hands, resulting in more precise data about the motion being captured. In addition, the suggested topology in this research can improve performance even more. The potential drawback of the proposed methodology is the complexity of the proposed feature selection methods. However, this drawback can be verified when applying the proposed methodology on different case studies, which is planned for the future work.

The authors received no funding for this study.

The authors declare that they have no conflicts of interest to report regarding the present study.

_{2,0}-norm and optimized graph