In the era of big data, traditional regression models cannot deal with uncertain big data efficiently and accurately. In order to make up for this deficiency, this paper proposes a quantum fuzzy regression model, which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation. In this paper, data envelopment analysis (DEA) is used to calculate the degree of importance of each data point. Meanwhile, Harrow, Hassidim and Lloyd (HHL) algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation. The application of the quantum fuzzy regression model to small-scale financial data proves that its accuracy is greatly improved compared with the quantum regression model. Moreover, due to the introduction of quantum computing, the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model. The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data. Thus, it is a new model for efficient and accurate big data processing in uncertain environments.

With the development of information technology, big data security and computing efficiency are essential research directions. In terms of big data security, such as digital continuity and application in the industrial field [

In order to compensate for the shortcomings of the traditional regression models in processing uncertain data. Tanaka first proposed fuzzy regression models to process uncertain data [

Fuzzy regression models can deal with the uncertainty in the data sets well, but with the growth of the dimension of the data sets matrix, classical computers face mole bottleneck. In order to seek better computers, researchers proposed that quantum computers can achieve exponential acceleration computing power, break through the bottleneck of computing power and compute processing through the operation of quantum states. Applications of quantum algorithms in machine learning algorithms have shown remarkable performance, such as the HHL algorithm solving linear equations and the parameters fitting problems of traditional regression models [

In recent years, many achievements have been made in the theoretical research of quantum regression models and fuzzy regression models in dealing with big data, but few achievements have been made in researching the intersection and integration of quantum regression models and fuzzy regression models. Because of the theoretical and practical significance of the research on the efficient and accurate processing of uncertain big data through the regression models will undoubtedly become the focus of the research in the field of big data.

This thesis introduces a novel regression model called the “quantum fuzzy regression model” to address the existing models’ inefficient and low accuracy issues. The quantum fuzzy regression model uses fuzzy membership degrees and quantum algorithms to improve the existing models. The fuzzy membership degrees are determined by data envelopment analysis (DEA) [

Since the regression models were established, one direction of research on the regression model is using fuzzy theory to describe uncertainty in the data sets to increase the accuracy of the predicted results. Another direction is lifting the computational power to speed up the calculation of parameter estimation. This thesis first briefly introduces the matrix parameters solution method of traditional regression model and the parameter solution method based on the fuzzy point data regression model and then introduces the study of the quantum regression model in quantum machine learning, which provides a theoretical research basis for the quantum fuzzy regression model.

The process of a linear regression model is shown below. Firstly, a data set with

The data set satisfies the relationship shown in

As

The least square method is used to predict the parameters. Loss function:

When

In actual situations, the degrees of each data point’s importance are different and some data points in the process of data sets are more important than other data points. However, in the traditional regression models, the importance of each data point cannot be distinguished. According to the definitions of the data points in [

Before explaining the fuzzy linear regression model, this paper first explains the standard methods to determine the fuzzy membership degrees for each training data point

Firstly, data sets with fuzzy membership degrees

The fuzzy regression model based on fuzzy point data is shown in

The independent variable matrix

This paper makes

When

The effects of fuzzy regression models after introducing fuzzy membership degrees in processing actual data have been verified by the predicted results, which have better effects on data fitting and can better reflect the importance of each data point in data fitting.

The classical regression model has insufficient computational power when processing high-dimensional data, so the quantum regression model is introduced for optimization. Compared with the classical regression model, the key difference of the quantum regression model is that the quantum circuits are introduced to accelerate the parameter estimation part and the data set processing part.

In 2012, Wiebe proposed the quantum regression model for the first time, which can effectively determine the parameters of the least square fitting on big data [

Firstly, the quantum computer uses the input matrixes

In the quantum encode circuit of the matrix

Since

According to the method of solving parameters in the traditional linear regression models, as shown in

In the diagram, the quantum circuit comprises H gates, U gates, R gates and Fourier transform (FT) module. H gates are Hadamard gates to obtain superposition. U gates are universal gates, including the information of data sets. R gates are the rotation gates encoding the eigenvalues of A. FT and FT^{*} modules are the quantum Fourier transform (QFT) circuit and inverse of the QFT circuit.

Firstly, the HHL algorithm constructs the circuit as shown in

Finally, the output of the HHL algorithm is taken as the final result of the parameters estimation of data fitting. According to the research in [

The existing fuzzy and quantum regression models cannot deal with the uncertainty in big data efficiently and accurately. The quantum fuzzy linear regression model proposed in this paper uses fuzzy membership degrees to describe uncertainty and quantum circuits to accelerate the efficiency of data preprocessing and parameter estimation, which can process large data efficiently and accurately.

First, the quantum fuzzy regression model uses the DEA algorithm (Banker, Charness, Cooper model (BCC model)) to calculate the fuzzy membership degrees of each training point

The data sets

According to the method of fuzzy point data, the quantum state generated after amplitude encoding contains data sets and fuzzy membership degrees. In order to combine fuzzy membership degrees and data sets by referring to the method DEA, the results

Using the above quanutum swap circuit, the inner product result of two quantum states can be obtained by measuring the result of one qubit, which meets the application effect of fuzzy membership degrees in a fuzzy regression model based on fuzzy point data and procession of swap test is as shown in

In the swap test circuit, the measure of the ancillary qubit probability of getting

The quantum states data sets containing fuzzy membership degrees are obtained through the above operations. The fuzzy quantum state data set can better describe the data characteristics and better fit the complex big data in an uncertain environment.

This paper applies the HHL algorithm for the quantum state fuzzy data sets using the phase estimation circuit, as shown in

After processing the phase estimation algorithm, the state of the additional qubit is changed from

Performing the phase estimation’s reverse operation and then measuring the auxiliary bit. When the measurement reaches 1, register Q2 will get the final solution, which will be analyzed in the runtime analysis, as

The pseudo codes of the quantum fuzzy regression model in this thesis are shown in

Predicting the stock index in the short term is one of the most important application scenarios in the financial field. Due to the huge potential of China’s economic development and the stability and favorable domestic situation, stock prices in China’s stock market are linearly correlated. It shows that the quantum fuzzy regression model can use the historical data of the Chinese stock index to establish a quantum fuzzy regression model to predict the trend of the stock index in the short term. The China Securities Index (CSI) series of scale indexes include CSI 100, 200 and 500. The CSI 500 index is a small-cap stock index, which is suitable for small long-term investments in asset allocation and has more research significance. Therefore, data from the China Securities 500 Index from August 16, 2021 to September 24, 2021 (data source: China Securities Index official website—China Securities Small Cap 500 Index-China Securities Index Co., LTD. (CSIndex.com.cn) is used as the original data sets in this thesis to establish a model to predict the closing price of China Securities Index 500 in the short term.

The original data set is obtained and collected from the original data sets of CSI 500 from August 20, 2021 to September 13, 2021 (data source: historical data on the official website of CSI, with two decimal digits reserved). The original data set of CSI 500 is shown in

Open | High | Low | Close | Change | Change (%) | Volume | Turnover |
---|---|---|---|---|---|---|---|

6915.02 | 6952.52 | 6843.00 | 6920.02 | −27.03 | −0.39 | 175.63 | 2051.36 |

6949.33 | 7031.88 | 6949.33 | 7024.56 | 104.54 | 1.51 | 189.46 | 2163.00 |

7031.65 | 7098.03 | 7009.69 | 7090.78 | 66.22 | 0.94 | 202.73 | 2364.79 |

7092.15 | 7153.27 | 7058.57 | 7153.00 | 62.22 | 0.88 | 198.13 | 2287.37 |

7151.53 | 7175.51 | 7115.84 | 7118.84 | −34.16 | −0.48 | 230.18 | 2553.64 |

7096.65 | 7185.73 | 7083.21 | 7180.11 | 61.26 | 0.86 | 223.83 | 2589.98 |

7202.63 | 7262.60 | 7190.61 | 7226.88 | 46.77 | 0.65 | 250.65 | 3010.15 |

7202.52 | 7255.34 | 7159.30 | 7255.34 | 28.46 | 0.39 | 245.11 | 2886.57 |

7271.03 | 7278.16 | 7100.31 | 7193.10 | −62.24 | −0.86 | 295.52 | 3310.23 |

7172.40 | 7309.33 | 7172.40 | 7307.01 | 113.91 | 1.58 | 253.41 | 2715.45 |

7335.70 | 7385.24 | 7227.03 | 7278.68 | −28.33 | −0.39 | 288.80 | 3150.17 |

7297.36 | 7359.54 | 7231.73 | 7355.40 | 76.72 | 1.05 | 255.88 | 2737.20 |

7360.38 | 7501.13 | 7350.70 | 7498.93 | 143.52 | 1.95 | 260.73 | 2804.03 |

7496.33 | 7549.45 | 7483.87 | 7544.04 | 45.12 | 0.60 | 272.85 | 2878.51 |

7534.98 | 7617.66 | 7526.97 | 7617.45 | 73.41 | 0.97 | 287.20 | 3023.80 |

7613.60 | 7661.43 | 7564.26 | 7607.67 | −9.78 | −0.13 | 293.47 | 3226.88 |

7613.54 | 7650.88 | 7566.78 | 7648.75 | 41.07 | 0.54 | 263.62 | 2989.56 |

According to the original data set of CSI 500, the factors affecting the closing price of the next day are obtained and all independent variables affecting the closing price of the next day are selected for analysis. Due to the limitation of available quantum bits for general quantum computers, the number of available quantum bits for the existing IBM quantum computer cloud platform is five qubits and considering that the input matrix may not meet the conditions of the Hermitian matrix. Therefore, the two independent variables with the highest significance among all independent variables are selected as the independent variables of the quantum fuzzy regression model and the closing price of the next day is taken as the dependent variable.

Data uncertainty is common in many practical application scenarios, and certainty is usually called risk in the financial field. The sources of uncertainty are mainly divided into three aspects: the error beyond distribution, the uncertainty of accidental events and the uncertainty of cognition. In this thesis’s application scenario, the main analysis’s uncertainty refers to the error beyond the distribution and the uncertainty of accidental events, that is, the weight of data and the analysis of abnormal data. Therefore, this thesis uses fuzzy membership degrees to describe the uncertainty and establish a model to get more accurate results. The original data sets are shown in

In the fuzzy membership degrees matrix for mode selection of this article is the data envelopment analysis (DEA) algorithm and can be a dependent variable in the closing as output indicators. The main factors affecting the next day’s closing price as the input index, the input index and the output index of efficiency evaluation of the results as fuzzy membership degrees of data points, generating the corresponding fuzzy membership degrees matrix. The fuzzy variable matrix is obtained by calculating the Hadamard product of the variable matrix and the corresponding fuzzy membership matrix of the quantum swap test circuit and the fidelity of the quantum swap test line was analyzed combined with the experimental results. The fuzzy data set is shown in

Open | Change | Close |
---|---|---|

6915.02 | −27.03 | 6949.33 |

6949.33 | 104.54 | 7031.65 |

7031.65 | 66.22 | 7092.15 |

7092.15 | 62.22 | 7151.53 |

7151.53 | −34.16 | 7096.65 |

7096.65 | 61.26 | 7202.63 |

7202.63 | 46.77 | 7202.52 |

7202.52 | 28.46 | 7271.03 |

7271.03 | −62.24 | 7172.40 |

7172.40 | 113.91 | 7335.70 |

7335.70 | −28.33 | 7297.36 |

7297.36 | 76.72 | 7360.38 |

7360.38 | 143.52 | 7496.33 |

7496.33 | 45.12 | 7534.98 |

7534.98 | 73.41 | 7613.60 |

7613.60 | −9.78 | 7613.54 |

In order to verify the validity of the quantum fuzzy linear regression model, it is necessary to verify the validity of the fuzzy linear regression model first. The original data set is prepared by zero-mean score (z-score) normalization. In this thesis, a multiple linear regression model, the independent variables z-score (Open) and z-score (Change) are established for analysis of dependent variable z-score (Close). Through the analysis of the timeliness data abbreviations for T-Open, T-Change and T-Close, membership fuzzy point data abbreviations for M-Open, M-Change and M-Close, the model summaries of multiple linear regression model and fuzzy linear regression model were finally obtained, as shown in

R | R square | Adjusted R square | Error of the estimate | R square change | F change | df1 | df2 | Sig. F |
---|---|---|---|---|---|---|---|---|

0.973 | 0.946 | 0.938 | 0.24948 | 0.946 | 114.005 | 0.936 | 0.195 | .000 |

df1, df2: z-score (change), z-score (open) |

R | R square | Adjusted R square | Error of the estimate | R square change | F change | df1 | df2 | Sig. F |
---|---|---|---|---|---|---|---|---|

0.942 | 0.887 | 0.863 | 0.16468 | 0.864 | 112.834 | 0.966 | 0.185 | .000 |

df1, df2: z-score (T-change), z-score (T-open) |

R | R square | Adjusted R square | Error of the estimate | R square change | F change | df1 | df2 | Sig. F |
---|---|---|---|---|---|---|---|---|

0.989 | 0.978 | 0.974 | 0.160927 | 0.978 | 283.102 | 0.958 | 0.3 | .000 |

df1, df2: z-score (M-change), z-score (M-open) |

According to the analysis of

Through comparative analysis of

Comparative analysis of

In a word, the quantum fuzzy point data regression model can improve the accuracy of the traditional regression model. When compared with other fuzzy regression models, such as timeliness data regression, fuzzy point data regression can get more accurate results.

A fuzzy linear regression model can better describe the uncertainty of the data set and get better parameter results when dealing with uncertain data. In order to verify the effectiveness of the quantum fuzzy linear regression model, this thesis carries out a quantum fuzzy linear regression model circuit simulation experiment on the IBM quantum cloud platform to estimate parameters. The quantum solution obtained by the quantum fuzzy linear regression model is compared with the classical solution to verify the validity of the fidelity analysis of the quantum solution.

For the variable matrix without multicollinearity, the quantum HHL algorithm is used to solve the linear equations to get the parameter components and the fidelity of the quantum solution is calculated to verify the accuracy of the simulation experiment of the quantum HHL algorithm. The final parameters are calculated according to the average value of parameter components and compared with the parameter results of the fuzzy linear regression model to verify the validity of the parameter results of the quantum fuzzy linear regression model.

Because of the limitation of available quantum bits, in this thesis by using a standardized fuzzy variable partitioned matrix to establish a fuzzy linear regression model and by using the quantum HHL segmentation parameter estimation algorithm for a linear regression model, standardization of quantum HHL algorithm fuzzy variable partitioned matrix and the parameters of the corresponding component are shown in

According to

Fidelity refers to the fidelity of the result of the quantum calculation. The experiment reflects that the result’s fidelity is accurate under a small number of quantum bits. The result calculated using the quantum state is consistent with the classical algorithm.

In order to solve the problem of complex data fitting in the big data environment, this thesis proposes a quantum fuzzy regression model to solve the problem of fitting complex high-dimensional data. The model in this thesis can make use of the advantages of quantum computing to efficiently encode and calculate data and introduce fuzzy membership degrees to get a better fitting effect. In the operational analysis, this paper uses the sample data set to illustrate the algorithm’s advantages and results. We believe that in increasing the number of available quantum bits, the quantum algorithm can be combined with fuzzy mathematics to better deal with the complex problems in the actual situation.

In the research process of this thesis, I have received a great deal of support and assistance. I would first like to thank my tutor Yan Chang, for her valuable guidance throughout my studies. In addition, I would like to thank my parents for their wise counsel and sympathetic ear. You are always there for me. Finally, I could not have completed this dissertation without the support of my friend Yusheng Lin, who provided stimulating discussions as well as happy distractions to rest my mind outside of my research.

This work is supported by the

The authors declare that they have no conflicts of interest to report regarding the present study.