This paper presents an effective machine learning-based depth selection algorithm for CTU (Coding Tree Unit) in HEVC (High Efficiency Video Coding). Existing machine learning methods are limited in their ability in handling the initial depth decision of CU (Coding Unit) and selecting the proper set of input features for the depth selection model. In this paper, we first propose a new classification approach for the initial division depth prediction. In particular, we study the correlation of the texture complexity, QPs (quantization parameters) and the depth decision of the CUs to forecast the original partition depth of the current CUs. Secondly, we further aim to determine the input features of the classifier by analysing the correlation between depth decision of the CUs, picture distortion and the bit-rate. Using the found relationships, we also study a decision method for the end partition depth of the current CUs using bit-rate and picture distortion as input. Finally, we formulate the depth division of the CUs as a binary classification problem and use the nearest neighbor classifier to conduct classification. Our proposed method can significantly improve the efficiency of inter-frame coding by circumventing the traversing cost of the division depth. It shows that the mentioned method can reduce the time spent by 34.56% compared to HM-16.9 while keeping the partition depth of the CUs correct.

Inter-frame prediction is an important part in the encoder process of HEVC. By using the time correlation between continuous video frames [

The mathematical equation of the rate-distortion value

where

To make the optimal PU decision with the lowest distortion rate, HEVC encoder needs to exhaustively check all the possible PU partitions. When the encoder completes the motion predictions for all the depths of CU, it finds the best PU partitioning and merges the CU motion vectors in a bottom-up order. The computational volume of the rate-distortion for each CTU is very great [

In addition, it shows that the BasketballDrill sequence (QP = 22) are encoded by the HM-16.9 test model in the LD (low delay) configuration. We show the coding time proportion of each part in HEVC in

According to the different judgment methods, the effective depth decision methods of the inter-frame CU can be divided into two categories: threshold comparison-based fast depth selection algorithm and machine learning-based fast depth selection algorithm. Threshold comparison is the most common method used to determine whether the current coding unit needs to further divided. For example, [

Therefore, the scholars take advantage of machine learning techniques to effectually decrease the calculational time of HEVC. The process of dividing CUs is regarded as a classified problem using classical classification algorithms in machine learning. HEVC can automatically estimate the division depth of CU without any traversal calculation. For example, [

As shown in

As shown in

Selecting a proper set of features as inputs greatly affect the classifier since it can potentially reduce the training and prediction time while also improving the accuracy of depth selection. In order to maintain the efficient coding efficiency of HEVC, coding rate and distortion can be used as reference conditions to select the correct coding method. Therefore, we will explore the influence of bit rate and distortion on the CU depth division and the validity of ultilizing them as input of the classifier.

As can be seen from

As can be seen from

As shown in

The NN (nearest neighbor) [

The NN classifier will be used to decide the depth division of the CU. The used video sequences contain rich content and scenes at different resolutions, that is, they not only contain a large number of shallow depth division CU in the region with simple texture and flat content, but also contain deep depth division CU in the region with intricate pattern or violent motion.

According to the idea of “top skip” and “early terminate”, this paper proposes a machine learning-based effective depth decision method for CU with two important part. The first part is forecasting the original partition depth of the CU, which primarily uses the texture complexity and QPs to predict. The second part is determining the end division depth of the CU, which primarily decides whether to terminate the depth division early.

Here we first introduce the “top skip” mechanism used in our fast depth selection algorithm. By designing a method to use texture complexity and QPs to predict the initial partition CU depth, the problem of time waste caused by uniformly dividing all CU from depth 0 is improved. The following is the detailed process:

Four test sequences, BQTerrace, BasketballDrill, BQSquare, and FourPeople, were selected with different QPs and entropy values. The quantitative parameter is the X-axis. The Y-axis expresses the entropy, which is in the range of 0.5~7.5. It is a very practical method to distinguish shallow and deep depth CUs simply and effectively by constructing a prediction dictionary.

In order to forecast whether the current CU needs to be divided, this paper feeds the NN classifier with the rate and distortion as the needed information. Here we introduce the details of different components involved in this process.

First, we train the original dividing depth forecast dictionary by four sequences. Second, the correct type of CTU is conducted by looking up a dictionary based on the QPs and the minimum entropy value. Third, the quadtree traverses from depth 0 if the current CU is a shallow depth CU. Otherwise, the quadtree traverses from depth 2. Fourth, we train the NN classifier through the code rate, distortion, and division result of abundant types of CU. Fifth, using distortion and bit rate as the needed information to output the classified division result. Sixth, continue to divide the current CU and repeat last step until the largest depth is reached. Overwise, do not continue dividing.

Video sequences with rich content and scenes were used for experiments to test the performance of the proposed depth determination method.

This experiment utilizes BDBR (Bjøntegaard Delta Bit-rate) and BDPSNR (Bjøntegaard Delta Peak Signal-to-Noise Rate) to express the coding ability of the proposed method. As shown in

This paper compares the ability of the proposed method with the fast coding method [

To prove that the termination depth partition prediction method is effective, we tested the classification accuracy when the depth is 0, 1 and 2, respectively. As shown in

This paper proposes an effective CU depth decision method which combines the thought of top skip and early termination, by solving the intricate problem result from the deep traversal of the CU in HEVC. Firstly, based on texture complexity and QPs, a scheme for forecasting the original segmentation depth of CU is designed. According to the idea of top skip, this method can help deep depth CUs reduce the rate-distortion cost calculation with depths of 0 and 1, thus saving coding time. Secondly, an end depth decision method based on the neural network classifier is designed. This method is proposed to model the division problem as two classification problems and releases early termination. It reduces the calculation cost for shallow depth CUs with depth of 2 or 3, and reduces the complexity of encoder. It shows that the proposed method saves an average of 34.56% coding time compared with the original HEVC encoder, and achieves a balance between encoding video quality and encoding bit rate cost.