For intelligent surveillance videos, anomaly detection is extremely important. Deep learning algorithms have been popular for evaluating realtime surveillance recordings, like traffic accidents, and criminal or unlawful incidents such as suicide attempts. Nevertheless, Deep learning methods for classification, like convolutional neural networks, necessitate a lot of computing power. Quantum computing is a branch of technology that solves abnormal and complex problems using quantum mechanics. As a result, the focus of this research is on developing a hybrid quantum computing model which is based on deep learning. This research develops a Quantum Computingbased Convolutional Neural Network (QCCNN) to extract features and classify anomalies from surveillance footage. A Quantumbased Circuit, such as the real amplitude circuit, is utilized to improve the performance of the model. As far as my research, this is the first work to employ quantum deep learning techniques to classify anomalous events in video surveillance applications. There are 13 anomalies classified from the UCFcrime dataset. Based on experimental results, the proposed model is capable of efficiently classifying data concerning confusion matrix, Receiver Operating Characteristic (ROC), accuracy, Area Under Curve (AUC), precision, recall as well as F1score. The proposed QCCNN has attained the best accuracy of 95.65 percent which is 5.37% greater when compared to other existing models. To measure the efficiency of the proposed work, QCCNN is also evaluated with classical and quantum models.
Surveillance is quickly gaining popularity due to technological advancement that may be utilized to safeguard life safety as well as break through security barriers. ClosedCircuit Television (CCTV) cameras were broadly utilized for monitoring and security events, as well as supplying proof to the surveillance system [
These cameras continually create massive amounts of video data, necessitating manual monitoring attempts which are both timeconsuming and inaccurate, necessitating the use of automated monitoring approaches. Because of the limited effectiveness of surveillance using humans, law enforcement authorities had a difficult time catching or averting unusual situations. A computer visionbased system that can successfully identify normal or anomalous occurrences without human intervention is needed to identify anomalous activity. This automated system not only aids in monitoring, but it also decreases the amount of human labor necessary to sustain 24hour manual observation [
Researchers have developed a few new strategies for detecting anomalies in surveillance footage [
The majority of the above strategies struggle from a high false alarm rate. Moreover, while these strategies function very effectively on simple datasets, their efficiency is restricted when dealing with reallife events. Also, the computation process to detect and classify multiple anomaly types is higher. The accuracy of anomaly detection should be further improved.
Quantum computing becomes a potential subject that may be able to assist to solve this issue through drastically new structures. As a result, research and development of new deep learning algorithms relying on quantum computers become critical to stay up with possible AI achievements.
Quantum computing could be a whole new computing model which uses quantum physics rather than conventional physics. Rather than traditional bits, quantum computing employs quantum bits, also qubits, which work with the superposition as well as uncertainty inherent in quantum physics [
Some works used quantum machine learning for image classification tasks [
The key contributions can be summarized as follows.
To propose a Quantum deep learningbased model named Quantum Computing based Convolutional Neural Network (QCCNN) for anomaly detection and classification in surveillance videos. To obtain better performance through the quantumbased circuit, such as an absolute amplitude circuit.
The results of testing the proposed models using the benchmark dataset UCFCrime show better performances than existing works. Concerning accuracy, ROC, AUC, precision, recall, and F1score, the proposed QCCNN surpasses existing techniques.
The rest of this work is organized as follows. Section 2 summarises the associated research on anomalous behavior recognition techniques. Section 3 introduces the proposed abnormal event detection approach, as well as the Quantum Computingbased deep learning approach. Section 4 contains the experimental outcomes. Section 5 ends with the conclusion.
Deep learning using a homomorphic encryption technique has been introduced in [
A new anomaly detection technique is proposed in [
In [
To detect unusual occurrences in the surveillance camera, a novel structure combining ResNet50 and ConvLSTM was proposed in [
The authors of [
For medical image classification, two models, Quantum orthogonal neural Networks (QONN) and Quantumassisted Neural Networks (QANN) were proposed in [
Variational quantum circuits with deep reinforcement learning were researched by Chen et al. (2020) [
This section describes the proposed work in detail. Using a Quantum machine learning model, anomalies in realtime surveillance videos are detected and classified. Quantum Computing based Convolutional Neural Network (QCCNN) is proposed, detailed in the subsections below.
The flow of the proposed work is given in
The dataset from which the proposed method was evaluated [
No. of videos  Anomaly 

50 (48)  Abuse 
150 (127)  Road accidents 
50 (45)  Arrest 
50 (41)  Arson 
150 (145)  Robbery 
50 (47)  Assault 
50 (27)  Shooting 
100 (87)  Burglary 
50 (29)  Shoplifting 
50 (29)  Explosion 
100 (95)  Stealing 
50 (45)  Vandalism 
50 (45)  Fighting 
950 (800)  Normal events 
It includes extracting frames from a video sequence. Following that, each obtained video frame is resized according to the input size of the model which is utilized in the subsequent steps of feature extraction. To be compatible with the proposed procedure, the UCFcrime dataset is split into many video frames. After that, each video frame size is standardized to 224 × 224 × 64. The preprocessing of an image enables it to be fed into a Classification model. An identical set of procedures were used in the testing phase of this research. The proposed model was tested on videos; therefore, the frames of the videos were looped while testing and all of the frames were treated to the same preprocessing as the training images.
In quantum computers, qubits [
The state
Quantum measurement seems to be an irreversible process, where data regarding a one qubit’s state is obtained while superposition gets destroyed. In terms of Hilbert Space,
Quantum gates, indicated using the letter Z and it is the fundamental quantum circuits that operate with a finite amount of qubits. It serves as a foundation for quantum circuits in the same way that classical logic gates serve as the foundation for ordinary digital circuits. Quantum gates were known as unitary operators, and it can be denoted as Z
The typical quantum gates utilized here were discussed further below.
Hadamard gate, which seems to be a onequbit gate can be defined as,
This gate provides the superposition principle of two states, particularly plus state, beginning with the single state qubit
Rotation gates,
CNOT gate, which is a twoqubit gate can be defined as,
Whenever the input consists of the fundamental states
That is, it switches the second qubit (which is the targeted qubit) only when the first qubit (controlled) is
A quantum circuit that is parameterized is generally utilized as a hidden layer to construct a hybrid QNN in the neural network. However, in the case of classical network topologies, realizing a quantum representation of classical data which is high higher dimensional while building the hybrid model is crucial for incorporating the quantum element into the classical design. A basic discussion of how to create a quantum state at this point is provided in this work.
A unitary operator is used to process a feature mapping first. This is given to a group of N0⟩ quantum nodes as a method for encoding classical data in the new Nqubit space. Before actually applying this to the quantum circuit, a unitary matrix should be traditionally constructed. The previous classical node values at the moment of insertion define its parameters. The previous classical activation can be reflected by the associated amplitude probability for estimating 1⟩ in this process in the quantum state, which is known as data embedding.
The parameterized quantum circuit has been used after the classical encoding process. A parameterized quantum circuit is one in which the rotation angles of every gate are defined with the help of elements in a classical vector (input). The output from a prior layer in the neural network can be gathered and utilized as input for the parameterized circuit. The measured data from the quantum circuit is gathered and given to the next hidden layer.
A real amplitude circuit (Quantum circuit) is selected to be used in the proposed QCCNN [
As seen in
This section describes how the anomaly is classified using the proposed QCCNN. Initially, the preprocessed input data is given as input to CNN layers. Three convolutional and pooling layers are responsible for extracting the spatiotemporal features from video frames. The output from the maxpooling layers (pooled feature maps) is then flattened into a vector given which is given to the quantum circuit. This is shown in
Convolutional layer
The most significant component of a CNN seems to be the convolution layer. This is the initial layer of a CNN that extracts features with a help of a convolution filter (also called the kernel) from the input (X × Y). From the input image, Kernels derive learnable parameters (such as weights) by employing forward propagation as well as backward propagation. To accomplish this task, a filter with size 3 × 3 has been dragged across the input matrix with stride 1. Elementwise matrix multiplication is performed and total the results at each step. A feature map is created as a result at the end.
The neuron value concerning position (x, y) in the j^{th} feature map for layer i can be given as,
In the above equation, t denotes the feature map in the (r − 1)^{th} layer linked to the present (s^{th}) feature map,
Activation function
Because most data in the actual world is nonlinear, activation functions are utilized to perform nonlinear data transformations. It ensures that the input space representation is mapped to a separate output space that meets the requirements. CNN uses a nonlinear activation function called the ReLU. This activation function’s result appears to be a continuous variable. When the input was negative, the output equals 0; else, the output is the same as the input. The ReLU function [
Another prominent consideration to utilize ReLU during the training of CNN is that it is much quicker than its competitors (sigmoid and tanh). All negative inputs are set to zero in ReLU, meaning that many nodes are ignored and will never be evaluated in future training.
A pooling layer is added between successive convolutional layers and is not trainable. The goal of introducing a pooling layer appears to be to lower the number of parameters and the network’s computing cost. This layer works independently on each input slice to resize it spatially. A filter with a size of 2 × 2 as well as a stride of 2 has been the most common sort of pooling layer. Every input depth slice, as well as the height and breadth, is downsampled by two in each step. Pooling processes include maxpooling, average pooling, and minpooling, among others. The CNN model employs the maxpooling, computed over every 2 × 2 tiny region in a depth slice.
To determine the output dimension of the maxpooling operation, the below mathematical expression can be utilized [
The suggested model includes two fully connected layers, one of which is put before the quantum layer and the other placed after it. These layers can be used to modify the quantum layer’s input and output sizes so that they correspond to the number of classes needed by the chosen dataset. To put it differently, the purpose of these two classical neural layers is to provide the coexistence of classical and quantum layers in the structure of the model. Also, it enables data embedding from the image space to the quantum capacity. In fully connected layers, all nodes of a layer engage and are linked to all other nodes of succeeding levels to make choices (classification, feature extraction). This help to determine the general connection of the characteristics. We add two fully linked nonlinear layers to guarantee that these nodes communicate smoothly and account for all potential dependencies at the feature level.
In terms of the quantum component, the quantum layer (Quantum Circuit) seeks to profit from the qualities of probabilistic QC. This will detect and classify the anomaly types from the extracted feature vector. The classified anomaly types are Burglary, Fighting, Abuse, Robbery, Arson, Shooting, Assault, Shoplifting, Explosion, and Vandalism.
The performance standards as well as the empirical results of the proposed QCCNN are discussed in this section. The proposed model’s effectiveness is quantitatively assessed via wellquality metrics such as the recall, confusion matrix, F1score, precision, accuracy, ROC, and AUC.
The models were trained on Google Colaboratory using Python’s sklearn package (version 1.0.2), and each user may expect: 1) a GPU Tesla 82 with 2498 CUDA cores, compute 3.7, 12G GDDR5 VRAM; 2) a CPU singlecore hyper threaded, i.e., (one core, two threads) Xeon Processors @2.20 GHz (No Turbo Boost); 3) 45MB Cache; 4) 12.4 GB of accessible RAM; as well as 5) 320 GB of storage available. Every QCNN classifier, independent of the circuit used, was given training for 100 epochs with the Adam optimizer and a learning rate of 0.0001 using crossentropy as the loss function. The CNNs were trained in the same manner, but it took 150 epochs for them to converge. The simulation parameters are provided in
Parameter  Value 

Epoch  150 
Learning rate  0.0001 
Optimizer  Adam 
Activation function  ReLu 
Number of layers  10 
Number of neurons  1000 
The confusion matrix has been regarded to be excellent, nevertheless, a straightforward objective metric that might be helpful when using any classification scheme. That measure provides a detailed understanding of how fine the classifier has been doing. And it is crucial to keep track of this while evaluating any classifier. As a result, this section gives an efficiency assessment of the proposed QCCNN classifier using confusion matrix evaluation.
The confusion matrices derived from the dataset using the proposed technique are shown in
In classifying tasks accurately, the following factors are taken into account: true positives (TP), which are the correct inferences and actual results; false positives (FP), which are the correct inferences but are incorrect in reality; false negatives (FN) are incorrect inferred results, but the actual results are correct; and true negatives (TN), which are incorrect inferred and actual results.
Accuracy can be formulated as,
The proposed QCCNN model is compared with classical models such as CNN [
Models  Accuracy  Precision  Recall  F1score 

Proposed QCCNN  95.65  0.86  0.98  0.92 
CNN [ 
85.12  0.84  0.85  0.84 
AutoEncoder [ 
83.84  0.79  0.72  0.75 
GoogleNet with OCSVM [ 
81.75  0.82  0.78  0.79 
HOG with OCSVM [ 
78.15  0.75  0.83  0.78 
RGB and flow twostream networks [ 
86.41  0.85  0.76  0.80 
The hybrid QCCNN has the ability to learn finer information to class images from the dataset with less complexity. The quantum gates can influence the final results and help speed up certain computational processes. The accuracy of QCCNN is 10%, 11.81%, 13.9%, 17.5%, and 9.24% higher than CNN, autoencoder, GoogleNet with OCSVM, HOG with OCSVM and RGB, and Flow twostream networks respectively. Also, the F1score of the proposed QCCNN is high which shows the efficiency of the model in classifying anomaly types from the surveillance video.
As the complexity of the problem grows, the speed requirements also increase in machine learning models. When compared to classical models, the proposed quantum machine learning models provide a more complex solution faster (huge volumes of data). This is because quantum superposition states allow for the manipulation of all possible combinations of a set of bits in a single operation, which speeds up the process of the models when compared to classical models. The proposed QCCNN uses quantum properties of the quantum circuit to reduce model training time. Entanglement in quantum computing also aids in the automatic determination of hyperparameters. When compared to other models, quantum circuits perform faster and produce more accurate results. In a noisy environment, it also produces accurate results with a fixed memory dimension.
In this work, the proposed QCCNN and existing models such as Quantum Generative Adversarial Network (QGAN) [
The authors of [
Models  Accuracy (%) 

Proposed QCCNN  95.65 
QGAN  93.39 
HAE  91.06 
QANN  89.10 
QONN  87.54 
Models  Precision  Recall  F1score 

Proposed QCCNN  0.86  0.98  0.92 
QGAN  0.85  0.92  0.88 
HAE  0.83  0.93  0.87 
QANN  0.84  0.87  0.85 
QONN  0.81  0.83  0.81 
No. of layers  Accuracy  Precision  Recall  F1 score 

1 Convolutional  92.67  0.68  0.74  0.79 
2 Convolutional  93.78  0.73  0.82  0.85 
3 Convolutional  94.56  0.89  0.87  0.89 
4 Convolutional  93.17  0.79  0.73  0.81 
1 Max pooling  89.48  0.71  0.72  0.78 
2 Max pooling  91.36  0.75  0.74  0.82 
3 Max pooling  92.97  0.82  0.79  0.87 
4 Max pooling  91.58  0.78  0.73  0.84 
1 Average pooling  83.68  0.68  0.69  0.73 
2 Average pooling  85.37  0.71  0.71  0.79 
3 Average pooling  86.58  0.76  0.76  0.82 
4 Average pooling  84.47  0.72  0.70  0.80 
Models  Precision  Recall  F1score 

Proposed QCCNN  0.86  0.98  0.92 
Quantum circuit 1 + CNN  0.63  0.70  0.66 
Quantum circuit 2 + CNN  0.62  0.65  0.63 
Quantum circuit 3 + CNN  0.70  0.63  0.66 
Quantum circuit 4 + CNN  0.67  0.74  0.70 
Quantum circuit 1 + AutoEncoder  0.41  0.58  0.48 
Quantum circuit 2 + AutoEncoder  0.46  0.52  0.47 
Quantum circuit 3 + AutoEncoder  0.50  0.55  0.52 
Quantum circuit 4 + AutoEncoder  0.58  0.62  0.59 
Method  AUC (%) 

Proposed QCCNN  96.63 
QGAN  95.32 
HAE  94.45 
QANN  92.18 
QONN  86.89 
Method  RMSE  

Old dataset (UCF crime) [ 
Unseen dataset (ShanghaiTech) [ 

Proposed QCCNN  0.61  0.57 
QGAN  0.75  0.78 
HAE  0.83  0.81 
QANN  0.89  0.93 
QONN  0.92  0.91 
From
The recall can be defined as the proportion of entire outcomes categorized properly with the help of a model. But, precision addresses the issue of how frequently a model is true if it forecasts yes. To completely evaluate the model’s performance, it is necessary to examine the accuracy as well as recall. In this instance, the F1score, which seems to be the harmonic mean of recall as well as precision, might be employed. The greater the score, the better the model. The percentage of correct predictions generated by the model is denoted as accuracy.
Recall, precision, and F1can be formulated in
The following are the experiments made for the proposed QCCNN with a different number of convolutional and pooling layers which is shown in
Adding more layers will aid in the extraction of deeper features. However, it is possible to do so to a certain extent. There is a limit. Instead of extracting features, it then tends to ‘overfit’ the data. Overfitting can result in errors of various types, such as false positives. We tested the proposed system’s performance by increasing the number of layers from one to four. While adding layers, the performance gradually improves. When three layers are used, the proposed QCCNN produces the best results. Following that, adding layers reduces performance due to the overfitting issue. As a result, the model is fitted to three layers. Since the average pooling approach smoothes down the image, the sharp features may be lost. Max pooling chooses the brightest pixels in the image. It is helpful when the backdrop of the image is dark and we are only interested in the image’s lighter pixels. Average pooling cannot always extract the important features because it considers everything and returns an average value that may or may not be significant. Max pooling concentrates on the most important features. Even though maxpooling is a nonlinear operation, it is primarily used to decrease the dimensionality of the input, thereby reducing overfitting and computation. As a result, in our proposed algorithm, max pooling outperforms average pooling.
The ROC curve seems to be a probability graph that represents a classification model’s accuracy across all classification criteria. This graph compares the true positive rate with the false positive rate. The higher the true positive rate, the better the detection efficiency. The lower the false positive rate, the steeper the ROC curve.
The AUC is a classification strategy measure. After accuracy, it is the second most common statistic.
A trustworthy analysis should provide an appropriate prediction (prediction with the least amount of error) for any type of test input. To validate the suggested model’s reliability, one of the reliable metrics Root Mean Square Error (RMSE) is used to measure the success of predictive models. The RMSE is a common approach for assessing a model’s error in forecasting quantitative data. The formal definition is as follows:
According to
The Population Stability Index (PSI) is used to determine the stability of the proposed model by assessing how the population or features have changed within the framework of the model. It is a term for determining how much a variable’s distribution has moved between two samples over time. It is extensively used for monitoring changes in population features and detecting potential problems with model performance. It is often a useful indicator of whether the model has stopped forecasting effectively owing to large changes in population distribution.
Using the following equation, calculate the PSI for each kind of anomaly:
Various actions and measures may be taken depending on the outcome. If the outcome is satisfactory, there is no need for more action. If the result is not satisfactory, additional detailed analysis will be required, based on whether a score or individual score component was evaluated for PSI. The PSI for the QCCNN model was estimated in
Type of anomaly 
Expected anomaly probability 
Predicted anomaly probability 
Product  

1  0.267  0.385  −0.118  −0.366  0.043 
2  0.165  0.144  0.021  0.136  0.003 
3  0.125  0.139  0.264  −0.106  −0.028 
4  0.219  0.060  0.159  −1.01  −0.160 
5  0.224  0.272  −0.048  −0.194  0.009 
1.000  1.000  −0.133 
This work investigates the hybrid quantum computingbased deep learning model for anomaly classification in surveillance videos. Unlike traditional CNN, this work uses a quantum circuit as one of the quantum layers in CNN for multiclass types. This increases overall accuracy and reduces the computational efficiency of the proposed QCCNN model. Furthermore, experimental findings reveal that the QCCNN outperforms the UCFCrime dataset in the confusion matrix, accuracy, precision, recall, F1score, ROC, and AUC. The accuracy of the QCCNN is 95.65%, which is 2.26%, 4.59%, 6.55%, and 8.11% greater than the QGAN, HAE, QANN, and QONN, accordingly. As a result, the proposed QCCNN outperforms other current models in overall classification accuracy of about 5.37%. In addition, the suggested method’s effectiveness is tested by altering the number of layers in CNN and comparing it to quantum and classical models. Future studies will seek to increase the quantum processing component’s contribution to the hybrid approach. Furthermore, more sophisticated quantum circuits are expected to improve the model’s learning capabilities.
The authors wish to express their thanks to one and all who supported them during this work.
The authors received no specific funding for this study.
The authors declare that they have no conflicts of interest to report regarding the present study.