Owing to the recent trends in remote health monitoring, real-time applications for measuring Heartbeat Rate and Respiration Rate (HARR) from video signals are growing rapidly. Photo Plethysmo Graphy (PPG) is a method that is operated by estimating the infinitesimal change in color of the human face, rigid motion of facial skin and head parts, etc. Ballisto Cardiography (BCG) is a nonsurgical tool for obtaining a graphical depiction of the human body’s heartbeat by inducing repetitive movements found in the heart pulses. The resilience against motion artifacts induced by luminance fluctuation and the patient’s mobility variation is the major difficulty faced while processing the real-time video signals. In this research, a video-based HARR measuring framework is proposed based on combined PPG and BCG. Here, the noise from the input video signals is removed by using an Adaptive Kalman filter (AKF). Three different algorithms are used for estimating the HARR from the noise-free input signals. Initially, the noise-free signals are subjected to Modified Adaptive Fourier Decomposition (MAFD) and then to Enhanced Hilbert vibration Decomposition (EHVD) and finally to Improved Variation mode Decomposition (IVMD) for attaining three various results of HARR. The obtained values are compared with each other and found that the EHVD is showing better results when compared with all the other methods.

With the rapid growth of remote medical monitoring, it is unsurprising that video-based heart rate monitoring is gaining popularity [

With the help of a consumer-grade camera and ambient light, Rong et al. devised a method for measuring remote plethysmo graphic signals [

The proposed model for measuring the HARR is framed using a combined PPG and BCG model. The main objective is to calculate the frequency of heartbeat

The input taken for this research is the video signals which are taken in real-time. The captured signals are split into frames using a hybrid video segmentation (HVS) method. The hybrid method consists of object-based video segmentation in addition to the keyframe extraction method. The obtained frames are made noise-free and are subjected to IVMD, MAFD, and EHVD for determining the HARR value.

The hybrid video segmentation (HVS) method combines the key frame extraction method in association with the object-based video segmentation method. Here the statistical model of the training method is implemented for facilitating the object-based video segmentation using key frame extraction. The shot-based video segmentation and the object-based segmentation are the two needed components used for segmenting the video signals. Especially the key frame extraction method is needed for providing the video representation which is of compact details containing the most wanted structures of the video contents. The video segmentation using the joint spatiotemporal method is used for extracting the various video objects through a clustering method. This could be used for classifying the whole video information and is used for enhancing the combined video segmentation based on key frame refinement. The HVS algorithm consists of three processes

Extraction of key frame for the shot abstraction of video

Object segmentation using model-based clustering

Key frame refinement

A better algorithm for key frame extraction is used by modifying certain attributes for getting a condensed input video representation and then the modified gaussian mixture model (GMM) is used for extracting the needed video objects. At last the trained GMM model is used for refining the key frames which is extracted for obtaining more condensed video shot representation architecture for video object prediction is illustrated in

Consider one video shot _{i} and n_{j} is found using the

If the resemblance value possesses more means the identical frames are more similar when considering the histogram. When a new cluster is added to the group of clusters, then the centroid value is to be calculated first. The keyframe is extracted from the sequence of clusters by comparing it with the threshold value, T.

In this approach, object-based segmentation from the video is extracted using the GMM model. The Gaussian distribution is used because it is highly traceable and the central limit theorem used here guarantees the summing of random variables from the gaussian distribution. Hence the performance of GMM is better, as no data assumption is made possible over here. A probabilistic video-based segmentation is used for extracting the object from the video segments. The probabilistic space determination is made by the abstraction of feature samples from a set of gaussian mixtures. The estimation of density in GMM is obtained in a semi-parametric mode as the complexity of the data is a deterministic factor and the size of data is a non-deterministic factor.

The raw video data which is in the time-space is transformed into multidimensional feature space, in which the feature vectors are provided with a topology for regularisation like the patterns of motion, colour, textures of the video information’s. The selection of feature is used for identifying the effective features, but somehow it is not possible to extract the whole contents because of dimensionality variation. The effectiveness of the features will be depending on the selection methods and the extraction methods by considering the motion, color and the texture. Here in this approach a pixel wise feature extraction is used which directly extracts the video data using the extraction process. The feature extraction is made for all the pixels in the frames.

The extraction of key frames is used for facilitating the object-based video segmentation. The clustering results is used for refining the keyframes which will make the shot-oriented representation compactible because of GMM. The extraction of key frame is made with the help of threshold value T. this will make the selection of video frames to be efficient and is needed more for object-based representation. After the extraction of key frames, a keyframe set S is obtained as

The distance between the

Then the distance in between the two successive keyframes kn_{i} and kn_{j} is calculated using the following mathematical expression

The information from the raw video signals are segmented to video frames and the shot videos signals are interpolated to 23 frames per second. Then the normalization process is started from the obtained signal X(t) as

where η and λ are the mean and standard deviation of X(t). The Kalman filter is used for smoothing the signal in order to amplify the heart pulse and respiration pulse. Once the attenuation process of the signal is over then it is subjected to band pass FIR filter. At last, the heart rate and the respiration rate from the signal using the specific algorithm used for real time prediction of the video signals. The robustness and the accuracy are made in control by using Lomb periodogram. The algorithm is shown below in

Algorithm For Adaptive Kalman Filter |
---|

Input: Heart Rate Signal |

Output: Heart rate and Respiration Rate |

Processes: |

function [x_aposterioriP_aposteriori] = KalmanFilterIteration(z,Q,R, x_aposteriori_last, P_aposteriori_last) |

x_apriori = x_aposteriori_last; |

P_apriori = P_aposteriori_last + Q; |

K = P_apriori/(P_apriori + R); |

x_aposteriori = x_apriori + K ∗ (z-x_apriori); |

P_aposteriori = (eye(length(x_aposteriori))-K) ∗ P_apriori; |

The Kalman filter is used for filtering out the unwanted signals and to retrieve back the original signal. It contains a nonstationary recursive filter for estimating the needed signal from the noisy background. The Kalman filter is described in steady state with two different stochastic equations

Here, _{k} is the column vector which represents the signal vector with no motion. the estimated value B_{k} is a scalar quantity.

The obtained vector value μ_{k} is the state transaction noise and another value w_{k} is the measurement noise. The matrix for X is determined with the time step value k−1 in consideration with the absence of the noise and the values are marked as below

Normally the Kalman filter consists of two different parts like updating the equations based on time constraints and updating the equations based on the measurements. For time updates the equation might be

For measurement updates the equation might be

Here, Γ_{k} is the Kalman gain, the error covariance estimation is determined with the setting of 3 × 3 matrix for the value ρ_{k}. Then the error covariance prediction is make with the value ρ^{−1}. This could be shown in the matrix as

For deriving the constants X and Y, the value of A_{k} is to be determined with uniform sampling rate. Here A_{k} value is set to be _{k + 1}, we get

The derivative approximation is expressed as

From the above equations it is clear that the estimated value B_{k} possess some value which is much lower than the predicted value and the final expression for the filter design is formulated as

The smaller value α and β shows that the A_{k + 1} exceeds the value B_{k + 1} that shows the prediction of heart pulse and respiration pulse is marked amplified.

In this research, the MAFD is supporting the adaptive decomposition of the video frames in the process of prediction of the HARR value. The obtained frames are grouped as F(t) which is made to place in H-Space and is given as_{m}(t) is the series of mono components and Ψ_{N} is the standard remainder.

The MAFD uses the ration system for pertaining the orthogonality process by fixing the functions for determining the HARR value. The main process involved in MAFD is to extract the mono components from the sequence of high component generation to the low component generation. The estimation of the energy relation is done by fixing the corresponding value of the standard remainders Ψ_{N}.

For achieving the higher convergence rate, the obtained energy value of the standard remainder, Ψ_{n} at all parts of the decomposition level is maintained to be minimum. Hence the maximum rate of the projection is shown below.

The MAFD value get differed from the normal Fourier decomposition models. For the normal frequency analysis, the various signals are decomposed with the help of MAFD which is purely depends on the distribution of energy that makes it possible for determining the overall frequency ranges with individual energy considerations.

The application of MAFD is measured by considering the noise-based signal which effectively removes the noises by using the Hilbert transform.

The analytic representation of the obtained noisy signal is determined as

The EHVD will decompose the non-stationary signals with various mono components along with the sequentially varying signals with suitable frequencies and amplitudes. The amplitude variation of the signal is decomposed by considering the first components of the input signal. The main part of the mixture is obtained with the highly complicated amplitude signals with lower amplitude. The instantaneous frequency is computed with the largest component analysed and is subtracted with the already extracted mono components from the input signals. Hence the EHVD decomposing of the signal s(t) is obtained by using the mathematical expression

The envelope of the signal is represented as α(t) and β(t). the EHVD method might use the analytical signal representation of the input signal for computing the amplitude of the envelope from the obtained. It is projected with highly complicated respiratory components for attaining the PPG signal which has lower energy components of EHVD.

The IVMD is a completely inherent and adaptable technique that decomposes a signal into many modes with varying centre frequencies, energy, and bandwidth. When synthesizing the incoming signal, each sub-signal has a particular sparsity and a central wavelength with low bandwidth. Here the parameters which is used for initializing the process might includes with some representation of the nodes. The larger values of the IVMD method is not provided with appropriate value, it may depends upon the application it is used. As the larger value in the IVMD method founds difficulties in estimating the center frequencies in an accurate manner. Here the obtained PPG and BCG signals are decomposed into its corresponding frequency spectrum values. The decomposition of the noise signal is correlated with the noise signals

For validating the performance of the proposed model, a set of experiments are conducted with some real-time video samples.

The video samples are taken from 25 participants (12 females and 13 males). The age range among the participants are ranging from 20 to 40 years. The video signals are collected by manually testing the participants with the HARR monitor. The subjects are asked to assemble in a separate hall during periodic intervals. The hall is equipped with all setups supporting real-time observation. A pulse oximeter is used for tracing out the real heartbeat value and the exact value is obtained using the BCG and the respiration rate is monitored using the method PPG in addition to manual checking. The data collection is made in a random manner by extracting about 10 frames per second for up to 10 min. The subjects are allowed to sit freely for 15 min, hence their head motion, face reaction, and all are noted.

The efficiency of the proposed model is tested with different aspects. Initially, the information from the PPG and BCG is obtained with video information. The video information is converted into various frames using the HVS method. The information regarding the signal conversion is shown in

Total number of participants | Total running time of the Video (s) | Total frames extracted | Total time consumed (s) | Frame rate |
---|---|---|---|---|

25 | 22500 | 828 | 5.83E+01 | 1.42E+01 |

From the total information retrieved (i.e., 22500 s of video) only a part is considered for the analysis. Most of the contents are removed by a process of smoothing and refinement. Mostly the video is taken out in real-time and hence the noise attack is more in the video and it can be removed with the help of the Kalman filter. Initially, the video signals are pre-processed before feeding into the Kalman filter. Mostly the videos are taken with the help of cameras with high-resolution pixels representation. After converting the videos into frames there is a need for checking the synchronization process. The distance between the frames is to be calculated and make sure that the identical distances are to be fixed in between the frames. After then the signal frames are to be set into various clusters or groups. The obtained RGB signal generated after setting up the groups is shown in

The groups of RGB signals from the video output are divided into various frames using the suitable segmentation process. Here the process of detrending the signals are to be needed for estimating the exact RGB value. Since the signals are grouped there is a need for separation between the frames, so a form of synchronization is needed for combining the original signal with the grouped signal. The detrending process is illustrated in

After synchronization, the extraction of green signals from the whole set of frames is needed. The video frames separation is mentioned in another way as green signal separation. For estimating the exact value in separated video frames, the green signal separation supports the process and is illustrated in

From the above

The

The change in the peak value shows the effectiveness of the algorithm using the Kalman filter. The variation is predicted with a suitable approach made in the estimation of the true value in association with the Kalman filtered value. The Smoothening process is made effective in the determination of the exact value of information without noise. The axis is taken at different intervals within the time and valuable consideration. The exact comparison of the true value and the Kalman filtered value is shown in

The noise-free signals are subjected to the Enhanced Hilbert vibration decomposition (EHVD) method and the result obtained is illustrated in

The parameters are fixed for the values are analysed between the Beats per minutes to heart rate and respiration rate. The peak value is to be detected for identification of the peak points where the pulse is so active. The values obtained from the given sources are shown in

Total frames extracted | Frame rate | EHVD respiration rate | EHVD heartbeat rate |
---|---|---|---|

828 | 1.42e + 01 | 4.92e + 00 | 9.45e + 00 |

Then the improved variational mode decomposition method is implemented for the determination of the HARR value. The peak value determination shows that the respiration rate and heart beat rate estimation is proved to be more effective in the analysis. This is illustrated in

The estimation is made for the values beats per minutes along with the deterministic values. The total values obtained after the experimentation analysis of IVMD are shown in the

Total frames extracted | Frame rate | IVMD respiration rate | IVMD heartbeat rate |
---|---|---|---|

828 | 1.42e + 01 | 3.44e + 00 | 8.27e + 00 |

The modified adaptive Fourier decomposition is used for the estimation of the heartbeat and the respiration rate. Here the peak value is identified to be in approximated range in many areas. A form of stability is found in the estimation of signals. The estimation of the HARR value suing MAFD is illustrated in

From the overall analysis held with the estimation of HARR value after the implementation of the three various models like IVMD, MAFD, and EHVD, a small variation was identified. The comparison status of the HARR value along with the three models are shown in

Total frames extracted | Frame rate | MAFD respiration rate | MAFD heartbeat rate |
---|---|---|---|

828 | 1.42e + 01 | 2.17e + 00 | 5.60e + 00 |

From the above

Methods | Total frames extracted | Frame rate | Respiration rate | Heartbeat rate |
---|---|---|---|---|

MAFD | 828 | 1.42e + 01 | 2.17e + 00 | 5.60e + 00 |

IVMD | 828 | 1.42e + 01 | 3.44e + 00 | 8.27e + 00 |

EHVD | 828 | 1.42e + 01 | 4.92e + 00 | 9.45e + 00 |

In this research, a video-based HARR measuring framework is proposed based on combined PPG and BCG. Here, the noise from the input video signals is removed by using an adaptive Kalman filter (AKF). Three different algorithms are used for estimating the HARR from the noise-free input signals. Initially, the noise-free signals are subjected to Modified Adaptive Fourier decomposition (MAFD) and then to Enhanced Hilbert vibration decomposition (EHVD) and finally to Improved Variation mode decomposition (IVMD) for attaining three various results of HARR. The experimental analysis proves that the HARR value of the EHVD possess better value when compared with IVMD and MAFD. The performance of the proposed model shall further be improved with a better filter and decomposition algorithm.