This work proposes a Tensor Train Random Projection (TTRP) method for dimension reduction, where pairwise distances can be approximately preserved. Our TTRP is systematically constructed through a Tensor Train (TT) representation with TT-ranks equal to one. Based on the tensor train format, this random projection method can speed up the dimension reduction procedure for high-dimensional datasets and requires fewer storage costs with little loss in accuracy, compared with existing methods. We provide a theoretical analysis of the bias and the variance of TTRP, which shows that this approach is an expected isometric projection with bounded variance, and we show that the scaling Rademacher variable is an optimal choice for generating the corresponding TT-cores. Detailed numerical experiments with synthetic datasets and the MNIST dataset are conducted to demonstrate the efficiency of TTRP.

Dimension reduction is a fundamental concept in science and engineering for feature extraction and data visualization. Exploring the low-dimensional structures of high-dimensional data attracts broad attention. Popular dimension reduction methods include Principal Component Analysis (PCA) [

The Random Projection (RP) is a widely used method for dimension reduction. It is well-known that the Johnson-Lindenstrauss (JL) transformation [

To be specific, Achlioptas [

This means that the matrix is sampled at a rate of

Recently, using matrix or tensor decomposition to reduce the storage of projection matrices is proposed in [

The rest of the paper is organized as follows. The tensor train format is introduced in

Let lowercase letters

The Kronecker product conforms the following laws [

Tensor Train (TT) decomposition [

Given a

To look more closely to

The tensor train format gives a compact representation of matrices and efficient computation for matrix-by-vector products. We first review the TT-format of large matrices and vectors following [

In

The subtraction of tensor

The dot product of tensor

Since

For simplify we assume that TT-ranks of

The Frobenius norm of a tensor

Computing the distance between tensor

The complexity of computing the distance is also

In summary, just merging the cores of two tensors in the TT-format can perform the subtraction of two tensors instead of directly subtraction of two tensors in standard tensor format. A sequence of matrix-by-vector products can achieve the dot product of two tensors in the TT-format. The cost of computing the distance between two tensors in the TT-format, reduces from the original complexity

Due to the computational efficiency of TT-format discussed above, we consider the TT-format to construct projection matrices. Our tensor train random projection is defined as follows.

In

By the TT-format,

Here as we set the TT-ranks of

Substituting

We compute the first term of the right hand side of

Considering the

Substituting

Similarly, the second term

If

Supposing that

Similarly, if for

Hence, combining

Therefore, using

In summary, substituting

One can see that the bound of the variance

Note that as

Proposition 3.1 extends the Hanson-Wright inequality whose proof can be found in [

We note that the upper bound in Proposition 3.2 is not tight, as it involves the dimensionality of datasets (

The procedure of TTRP is summarized in Algorithm 2. For the input of this algorithm, the TT-ranks of

We demonstrate the efficiency of TTRP using synthetic datasets and the MNIST dataset [

In Definition 3.1, we set the TT-ranks to be one. To explain our motivation of this settting, we investigate the effect of different TT-ranks—we herein consider the situation that the TT-ranks take

Two synthetic datasets with dimension

Two synthetic datasets are tested to assess the effect of different distributions for TT-cores, whose sizes are

The storage of the projection matrix and the cost of computing

Gaussian RP | Very sparse RP | Gaussian TRP | TTRP | |
---|---|---|---|---|

Storage cost | ||||

Computational cost |

Two synthetic datasets with size

Gaussian RP | Very Sparse RP | ||||
---|---|---|---|---|---|

Mean | Variance | Storage | Mean | Variance | Storage |

0.9908 | 0.0032 | 240000 | 0.9963 | 0.0025 | 2400 |

Dimensions for tensorization | Gaussian TRP | TTRP | |||||
---|---|---|---|---|---|---|---|

Mean | Variance | Storage | Mean | Variance | Storage | ||

[6, 4] | [100, 100] | 0.9908 | 0.0026 | 4800 | 0.9884 | 0.0026 | 1000 |

[4, 3, 2] | [25, 20, 20] | 0.9747 | 0.0062 | 1560 | 0.9846 | 0.0028 | 200 |

[3, 2, 2, 2] | [10, 10, 10, 10] | 0.9811 | 0.0123 | 960 | 0.9851 | 0.0035 | 90 |

Next the CPU times for projecting a data point using the four methods (TTRP, Gaussian TRP, Very Sparse RP and Gaussian RP) are assessed. Here, we set the reduced dimension

Finally, we validate the performance of our TTRP approach using the MNIST dataset [

Random projection plays a fundamental role in conducting dimension reduction for high-dimensional datasets, where pairwise distances need to be approximately preserved. With a focus on efficient tensorized computation, this paper develops a tensor train random projection (TTRP) method. Based on our analysis for the bias and the variance, TTRP is proven to be an expected isometric projection with bounded variance. From the analysis in Theorem 3.2, the Rademacher distribution is shown to be an optimal choice to generate the TT-cores of TTRP. For computational convenience, the TT-ranks of TTRP are set to one, while from our numerical results, we show that different TT-ranks do not lead to significant results for the mean and the variance of the ratio of the pairwise distance. Our detailed numerical studies show that, compared with standard projection methods, our TTRP with the default setting (TT-ranks equal one and TT-cores are generated through the Rademacher distribution), requires significantly smaller storage and computational costs to achieve a competitive performance. From numerical results, we also find that our TTRP has smaller variances than tensor train random projection methods based on Gaussian distributions. Even though we have proven the properties of the mean and the variance of TTRP and the numerical results show that TTRP is efficient, the upper bound in Proposition 3.2 involves the dimensionality of datasets (

The authors thank Osman Asif Malik and Stephen Becker for helpful suggestions and discussions.

If all TT-ranks of tensorized matrix

We set

Substituting

Applying

Generally, for some