Multi-label learning is a meaningful supervised learning task in which each sample may belong to multiple labels simultaneously. Due to this characteristic, multi-label learning is more complicated and more difficult than multi-class classification learning. The multi-label twin support vector machine (MLTSVM) [

Multi-label learning is a meaningful supervised learning task, wherein each sample may belong to multiple different labels simultaneously. In real life, many applications employ multi-label learning, including text classification [

The TSVM [

The aforementioned improvements to the MLTSVM mainly focused on improving the generalization performance and learning speed. It is very time consuming and difficult to obtain all labels of all samples for multi-label learning problems; in fact, we can obtain only a large number of partially labelled and unlabelled samples and a small number of labelled samples. However, the MLTSVM and its improvements can use only expensive labelled samples and ignores inexpensive partially labelled and unlabelled samples. Because of this disadvantage, we propose a novel semi-supervised MLTSVM, named SS-MLTSVM, which can take full advantage of the geometric information of the edge distribution embedded in partially labelled and unlabelled samples by introducing a manifold regularization term into each sub-classifier and use the successive overrelaxation (SOR) method to increase the solving speed. Experimental results show that, compared with the MLTSVM, our SS-MLTSVM has better classification performance.

The structure of this paper is as follows: Section 2 introduces some related works, such as the TSVM and MLTSVM. In Section 3, the SS-MLTSVM is introduced in detail, including the linear model, nonlinear model, decision rules and training algorithm. The fourth section gives the experimental results of the proposed algorithm on the benchmark datasets. The fifth section is the conclusion.

For the binary classification problem, we suppose the training set is

The TSVM is aimed to find two nonparallel hyperplanes:

The original problem of TSVM is:

where

By introducing the Lagrange multiplier, the dual problems of

where

The two hyperplanes can be obtained by solving the dual problems as follows:

For the multi-label problem, we denote the training set as:

where

The MLTSVM seeks

We denote the samples belonging to the

where

By introducing the Lagrange multiplier, the dual problems of

where

By solving the dual problem

Similar to the MLTSVM, the ML-STSVM also seeks

The original problem for label

where

The dual problem of

where

By solving the dual problem

For the semi-supervised multi-label problem, we define the training set as follows:

where

The manifold regularization framework [

where

Similar to the MLTSVM, for each label, the SS-MLTSVM seeks a hyperplane:

For the

To make full use of

The regularization term

The manifold regularization term

where

and

For the

where

The Lagrange function of

where

According to

where

For the

In this section, using the kernel-generated surfaces, we extend the linear SS-MLTSVM to the nonlinear case. For each label, the nonlinear SS-MLTSVM seeks the following hyperplanes:

where

The original problem of the nonlinear SS-MLTSVM is as follows:

The Lagrange function of

According to KKT theory, we can obtain:

According to

where

By solving the dual problem, the hyperplane of the

In this subsection, we present the decision function of our SS-MLTSVM. For a new sample

is less than or equal to the given value

In this subsection, we use SOR to solve the dual problems

Algorithm 1: The SOR for optimization problem |

INPUT: |

penalty parameter |

OUTPUT: |

The optimal solution |

Step 1: Initialize the iteration variable |

Step 2: Decompose |

Step 3: While |

Calculate |

Project |

End while. |

In this section, we present the classification results of backpropagation for multi-label learning (BPMLL) [

Datasets | Domain | Unlabelled |
Labelled |
Feature | Label |
---|---|---|---|---|---|

Flags | image | 97 | 97 | 19 | 7 |

Birds | audio | 322 | 323 | 260 | 19 |

Emotions | music | 296 | 297 | 72 | 6 |

Scene | image | 1203 | 1204 | 294 | 6 |

Yeast | biology | 1235 | 1236 | 103 | 14 |

The parameters of the algorithm have an important impact on the classification performance. We use 10-fold cross-validation to select the appropriate parameters for each algorithm. For BPMLL, the number of hidden neurons is set to 20% of the input dimension, and the number of training epochs is 100. For the ML-kNN, the number of nearest neighbours is set to 5. For the Rank-SVM, the penalty parameter

In the experiments, we use five popular metrics to evaluate the multi-label classifiers, which are Hamming loss, average precision, coverage, one_error and ranking loss. Next, we introduce these five evaluation metrics in detail.

We denote the total number of samples by

The evaluation criteria are used to measure the proportion of labels that are wrongly classified.

where

The evaluation criteria are used to measure how many steps we need to go down the ranked label list to contain all true labels of a sample.

The evaluation criteria are used to measure the proportion of samples whose label with the highest prediction probability is not in the true label set.

where

The evaluation criteria are used to measure the proportion of label pairs that are ordered reversely.

The evaluation criteria are used to measure the proportion of labels ranked above a particular label

We show the average precision, coverage, Hamming loss, one_error and ranking loss of each algorithm on the benchmark datasets in

where

where

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 0.785639 ± 0.051239 | 0.793221 ± 0.008711 | 0.795623 ± 0.003876 | 0.777998 ± 0.009513 | |

Birds | 0.358771 ± 0.010801 | 0.421682 ± 0.008798 | 0.406008 ± 0.003114 | 0.404525 ± 0.005299 | |

Emotions | 0.581655 ± 0.030349 | 0.697028 ± 0.004776 | 0.709426 ± 0.005503 | 0.757375 ± 0.005520 | |

Scene | 0.453430 ± 0.020162 | 0.842465 ± 0.003841 | 0.830496 ± 0.002475 | 0.827238 ± 0.002951 | |

Yeast | 0.720021 ± 0.023814 | 0.731047 ± 0.001198 | 0.730825 ± 0.000957 | 0.731715 ± 0.000454 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 3.875556 ± 0.663135 | 3.777000 ± 0.031977 | 3.797440 ± 0.024735 | 3.837778 ± 0.082848 | |

Birds | 3.923201 ± 0.073242 | 4.101629 ± 0.052476 | 3.889508 ± 0.029002 | 3.482169 ± 0.028734 | |

Emotions | 2.868046 ± 0.294812 | 2.245276 ± 0.034788 | 2.191575 ± 0.032059 | 2.044092 ± 0.035888 | |

Scene | 2.171729 ± 0.141061 | 0.555427 ± 0.008191 | 0.598790 ± 0.007469 | 0.612847 ± 0.011717 | |

Yeast | 6.341873 ± 0.342050 | 6.456955 ± 0.022583 | 6.231757 ± 0.010173 | 6.987227 ± 0.024749 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 0.347143 ± 0.028855 | 0.329460 ± 0.009443 | 0.306206 ± 0.006918 | 0.315238 ± 0.009461 | |

Birds | 0.067793 ± 0.001566 | 0.106989 ± 0.003470 | 0.097220 ± 0.001389 | 0.087929 ± 0.001621 | |

Emotions | 0.374502 ± 0.040605 | 0.263362 ± 0.003004 | 0.264391 ± 0.003745 | 0.222902 ± 0.003495 | |

Scene | 0.285942 ± 0.019638 | 0.151188 ± 0.003112 | 0.184711 ± 0.00128 | 0.147302 ± 0.001591 | |

Yeast | 0.216659 ± 0.016479 | 0.212532 ± 0.000808 | 0.212260 ± 0.001056 | 0.210753 ± 0.001176 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 0.262222 ± 0.012525 | 0.246806 ± 0.029941 | 0.195583 ± 0.021652 | 0.238889 ± 0.022773 | |

Birds | 0.765142 ± 0.019694 | 0.642681 ± 0.004973 | 0.733791 ± 0.012932 | 0.686058 ± 0.012041 | |

Emotions | 0.634943 ± 0.051561 | 0.412047 ± 0.014014 | 0.383724 ± 0.012444 | 0.296793 ± 0.006964 | |

Scene | 0.820441 ± 0.026645 | 0.269452 ± 0.003841 | 0.278494 ± 0.004690 | 0.287614 ± 0.004593 | |

Yeast | 0.240158 ± 0.036489 | 0.265893 ± 0.003882 | 0.241793 ± 0.002361 | 0.249052 ± 0.003871 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 0.229981 ± 0.043959 | 0.228139 ± 0.008245 | 0.239685 ± 0.010365 | 0.229881 ± 0.009013 | |

Birds | 0.313953 ± 0.006586 | 0.318532 ± 0.004400 | 0.302169 ± 0.004040 | 0.292203 ± 0.002815 | |

Emotions | 0.392612 ± 0.039955 | 0.266880 ± 0.004724 | 0.253129 ± 0.005824 | 0.216821 ± 0.006205 | |

Scene | 0.418993 ± 0.028879 | 0.098806 ± 0.001565 | 0.105142 ± 0.001448 | 0.124898 ± 0.002293 | |

Yeast | 0.185699 ± 0.001101 | 0.181432 ± 0.000784 | 0.217908 ± 0.000920 | 0.174228 ± 0.021022 |

We can obtain

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 5 | 4 | 2 | 3 | |

Birds | 2 | 5 | 4 | 3 | |

Emotions | 5 | 3 | 4 | 2 | |

Scene | 5 | 3 | 4 | 2 | |

Yeast | 5 | 4 | 3 | 2 | |

Average | 4.4 | 3 | 3 | 3 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 5 | 4 | 2 | 3 | |

Birds | 5 | 2 | 4 | 3 | |

Emotions | 5 | 4 | 3 | 2 | |

Scene | 5 | 2 | 3 | 4 | |

Yeast | 2 | 5 | 3 | 4 | |

Average | 4.4 | 3.2 | 2.6 | 3.4 |

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 4 | 2 | 5 | 3 | |

Birds | 4 | 5 | 3 | 2 | |

Emotions | 5 | 4 | 3 | 2 | |

Scene | 5 | 2 | 3 | 4 | |

Yeast | 4 | 3 | 5 | 2 | |

Average | 4.4 | 3.2 | 2.2 | 3.6 |

From the above analysis, we can conclude that our SS-MLTSVM is superior to the other algorithms for all metrics.

We show the learning time of different algorithms on the benchmark datasets in

BPMLL | ML-kNN | Rank-SVM | MLTSVM | SS-MLTSVM | |
---|---|---|---|---|---|

Flags | 4.808817 | 0.050400 | 0.359997 | 0.074584 | |

Birds | 16.868832 | 4.273635 | 0.187632 | 1.566527 | |

Emotions | 17.929608 | 1.282045 | 0.384163 | 1.153655 | |

Scene | 154.982600 | 5.794408 | 4.145783 | 12.102076 | |

Yeast | 69.403602 | 49.447880 | 19.581926 | 38.792532 | |

Average | 52.798700 | 0.868840 | 12.231600 | 10.737873 |

In this subsection, we investigate the effect of the size of unlabelled samples on the classification performance. In

From

In this paper, a novel SS-MLTSVM is proposed to solve semi-supervised multi-label classification problems. By introducing the manifold regularization term into the MLTSVM, we construct a more reasonable classifier and use SOR to speed up learning. Theoretical analysis and experimental results show that, compared with the existing multi-label classifiers, the SS-MLTSVM can take full advantage of the geometric information embedded in partially labelled and unlabelled samples and effectively solve semi-supervised multi-label classification problems. It should be pointed out that our SS-MLTSVM does not consider the correlation among labels; however, the correlation among labels is very valuable to improve the generalization performance. Therefore, more effective methods of obtaining correlation among labels should be addressed in the future.