Clustering is a crucial method for deciphering data structure and producing new information. Due to its significance in revealing fundamental connections between the human brain and events, it is essential to utilize clustering for cognitive research. Dealing with noisy data caused by inaccurate synthesis from several sources or misleading data production processes is one of the most intriguing clustering difficulties. Noisy data can lead to incorrect object recognition and inference. This research aims to innovate a novel clustering approach, named Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering (PNTS3FCM), to solve the clustering problem with noisy data using neutral and refusal degrees in the definition of Picture Fuzzy Set (PFS) and Neutrosophic Set (NS). Our contribution is to propose a new optimization model with four essential components: clustering, outlier removal, safe semi-supervised fuzzy clustering and partitioning with labeled and unlabeled data. The effectiveness and flexibility of the proposed technique are estimated and compared with the state-of-art methods, standard Picture fuzzy clustering (FC-PFS) and Confidence-weighted safe semi-supervised clustering (CS3FCM) on benchmark UCI datasets. The experimental results show that our method is better at least 10/15 datasets than the compared methods in terms of clustering quality and computational time.

The finding of underlying connections between the human brain and events has made the development of sophisticated clustering algorithms fashionable in cognitive research [

Semi-supervised fuzzy clustering techniques were introduced with additional information provided by users [

The safe semi-supervised fuzzy clustering approach introduced in [

To establish the safe level of each sample in the data set, Guo et al. [

This research aims to develop a new clustering method to remove the noise from data and increase the performance of the clustering method. This method integrates the semi-supervised clustering method and the picture fuzzy set [

Based on the original Fuzzy C-Means (FCM) model, a fuzzy clustering algorithm for images (a.k.a. FC-PFS) introduced in [

To handle problems with enhancing “safe information” and reducing the effect of “noisy data”, Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering (PNTS3FCM) is introduced. This is a new technique to address the issue of data partition with noisy information. The PNTS3FCM approach includes picture fuzzy and neutrosophic set concepts in the semi-supervised fuzzy clustering with a safe information procedure. The research proposes a new optimization model consisting of four essential components: a clustering component, an outlier-solving component and a safe semi-supervised fuzzy clustering using labeled and unlabeled data. The first two parts employed FC-PFS and the last two are the new parts to enhance safe information and reduce noisy data. An iterative technique from the formulation is also provided to construct the cluster centers and memberships. In fact, the survey has revealed a new field of study: safe, semi-supervised clustering on the picture fuzzy set. To compare PNTS3FCM with other available methods on benchmark datasets, two similar algorithms-FC-PFS [

The remaining paper is structured as follows: Section 2 offers the essential information underpinning our study. The proposed approach is introduced in Section 3 and the experimental results are presented in Section 4. Some conclusions are given in the last section.

In this section, some fundamental concepts and methods of semi-supervised clustering are presented, including Safe semi-supervised clustering and Picture fuzzy set and picture fuzzy clustering.

Safe semi-supervised fuzzy clustering approaches, including S3FCM [

For S3FCM, consider the dataset ^{th} element belonging to the ^{th} cluster is characterized by

_{ik} is the distance between the ^{th} element and ^{th} cluster. The final cluster labels are determined through the algorithm [

The below function calculates the center

On the other hand, the LHC-S3FCM [

Therefore, the cluster centers

Another approach of FCM is Confidence-weighted Safe Semi-supervised Clustering (CS3FCM) [

The methods of Gan [

By generalizing the fuzzy set in [

Then, the refusal degree is computed by function:

The objective of FC-PFS [

The values of

For the above objective function, the cluster centers

In [

The detailed steps for the FC-PFS algorithm are shown below.

The idea behind the proposed method (PNTS3FCM) is the combination between PFS and safe semi-supervised fuzzy clustering by introducing a novel objective function with four primary components. The first and the second stages are employed from the original picture fuzzy clustering method [

To deal with “safe information”, the two last stages coordinate the safe semi-supervised fuzzy clustering using both labeled and unlabeled data. PNTS3FCM has two phases: Firstly, FC-PFS is used to partition all data to get the clustering result with positive, neutral and refusal values. The second phase uses all data with these values to partition data to archive better clustering quality by enhancing safe data information and reducing noisy data.

The technique produces final clusters that are reliable and confident. We will discuss the formulation and algorithm for this concept in the next section.

As illustrated by the main idea above, this section will describe the details of the proposed model. The objective function is stated by the following formula:

With the constraints

The denominator

Finally,

The

Using the Lagrangian method, the optimal solutions to the stated problem are presented in

The positive degree

The positive degree

Other degrees are shown below:

Details of the FPNTS3FCM algorithm are below.

a) PNTS3FCM has better clustering quality than the related methods, such as FC-PFS and CS3FCM algorithm, due to the capability to handle noisy data.

b) PNTS3FCM produces more information about the clusters, such as the cluster centers and the picture fuzzy degrees (positive, neutral, negative, refusal). It deals with both “safe information” and “noisy data”.

c) PNTS3FCM is the combination of three major concepts: SAFE, SEMI Clustering and PICTURE Fuzzy Set. The combination is the first trial in the literature toward practical problems.

a) PNTS3FCM takes more computational time than the other algorithms due to the calculation of two additional parts in the objective function

b) The model contains many parameters which need to be tuned in some real-world applications.

The experiments are performed on a Core i5-powered HP laptop using the C programming language. The selected benchmark UCI datasets [

Dataset | No. of records | No. of attributes | No. of clusters |
---|---|---|---|

Australian | 690 | 14 | 2 |

Balance-scale | 625 | 4 | 3 |

Dermatology | 366 | 34 | 6 |

Heart | 270 | 13 | 2 |

Iris | 150 | 4 | 3 |

Spambase | 4601 | 57 | 2 |

Tae | 151 | 5 | 3 |

Waweform | 5000 | 40 | 3 |

WDBC | 569 | 30 | 2 |

Dataset | No. of samples | No. of features | No. of clusters | No. of coutlier (%) |
---|---|---|---|---|

Ecoli | 336 | 7 | 8 | 2.6 |

Glass | 214 | 9 | 6 | 4.2 |

Yeast | 1364 | 8 | 10 | 4.7 |

Wine | 178 | 13 | 3 | 7.7 |

Vertebral | 310 | 6 | 3 | 12.5 |

Ionosphere | 351 | 34 | 2 | 36 |

Experiments are executed to compare the proposed PNTS3FCM approach and the state-of-art methods, CS3FCM [

The value of ASWC is computed by

The value of PBM [

^{th}

The DB [27] is determined by

^{th} cluster. In which

The average value and standard deviation value in experimental results are denoted as Ave and STD Dev, respectively.

Herein, the proposed method is assessed by classification accuracy in two situations, including on all data and labeled data. Herein, the experimental results are presented following two of these cases.

Using all the data elements of 15 datasets, the classification accuracy of PNTS3FCM, FC-PFS and CS3FCM are calculated and presented as follows.

METHOD | PNTS3FCM | FC-PFS | CS3FCM | |||
---|---|---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | Ave | STD Dev | |

Australian | 0.69002 | 0.00786 | 0.61856 | 0.00063 | 0.00254 | |

Balance-scale | 0.00556 | 0.51412 | 0.01218 | 0.51685 | 0.01939 | |

Dermatology | 0.01209 | 0.55878 | 0.01647 | 0.64483 | 0.01245 | |

Heart | 0.00790 | 0.6552 | 0.00195 | 0.7421 | 0.00232 | |

Iris | 0.00392 | 0.92299 | 0.01076 | 0.89076 | 0.01963 | |

Spambase | 0.01458 | 0.77682 | 0.00924 | 0.75396 | 0.0088 | |

Tae | 0.03453 | 0.47875 | 0.00337 | 0.45421 | 0.00663 | |

Waweform | 0.01140 | 0.55104 | 0.0074 | 0.52295 | 0.0076 | |

WDBC | 0.77269 | 0.00921 | 0.70639 | 0.0024 | 0.0026 |

As shown in

From the results in

METHOD | PNTS3FCM | FC-PFS | CS3FCM | |||
---|---|---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | Ave | STD Dev | |

Ecoli | 0.51301 | 0.00936 | 0.51399 | 0.00509 | 0.00974 | |

Glass | 0.00741 | 0.42114 | 0.0035 | 0.42905 | 0.00625 | |

Yeast | 0.02388 | 0.32437 | 0.00401 | 0.32909 | 0.0101 | |

Wine | 0.89228 | 0.01452 | 0.0015 | 0.92186 | 0.00484 | |

Vertebral | 0.00830 | 0.48733 | 0.00227 | 0.51600 | 0.00655 | |

Ionosphere | 0.00102 | 0.52571 | 0.00037 | 0.53613 | 0.00053 |

By using the labeled data elements of 15 datasets, the classification accuracy (CA) of PNTS3FCM, FC-PFS and CS3FCM are calculated and presented as follows.

METHOD | PNTS3FCM | FC-PFS | CS3FCM | |||
---|---|---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | Ave | STD Dev | |

Australian | 0.03576 | 0.59368 | 0.02729 | 0.73395 | 0.00452 | |

Balance-scale | 0.07930 | 0.47638 | 0.02352 | 0.53585 | 0.01231 | |

Dermatology | 0.05447 | 0.44638 | 0.04177 | 0.47918 | 0.03509 | |

Heart | 0.04002 | 0.61693 | 0.03477 | 0.74145 | 0.00235 | |

Iris | 0.83296 | 0.02356 | 0.77855 | 0.09295 | 0.01784 | |

Spambase | 0.06753 | 0.69296 | 0.05677 | 0.70896 | 0.02728 | |

Tae | 0.06706 | 0.53238 | 0.03506 | 0.55857 | 0.00782 | |

Waweform | 0.05428 | 0.50336 | 0.03123 | 0.52359 | 0.0094 | |

WDBC | 0.70137 | 0.05464 | 0.6598 | 0.05484 | 0.00795 |

In

METHOD | PNTS3FCM | FC-PFS | CS3FCM | |||
---|---|---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | Ave | STD Dev | |

Ecoli | 0.104192044 | 0.55551 | 0.05475 | 0.51386 | 0.01736 | |

Glass | 0.068188815 | 0.43231 | 0.02221 | 0.44376 | 0.00876 | |

Yeast | 0.10076 | 0.29044 | 0.02543 | 0.35129 | 0.01674 | |

Wine | 0.76932 | 0.072052299 | 0.80023 | 0.09646 | 0.02488 | |

Vertebral | 0.058126313 | 0.49862 | 0.03127 | 0.58464 | 0.01392 | |

Ionosphere | 0.52633 | 0.001123518 | 0.53435 | 0.01013 | 0.00212 |

METHOD | PNTS3FCM | FC-PFS | CS3FCM | |||
---|---|---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | Ave | STD Dev | |

Australian | 0.07284 | 3.59062 | 0.29431 | 3.80124 | 0.24324 | |

Balance-scale | 6.54266 | 0.69000 | 52.46293 | 4.3444 | 0.22229 | |

Dermatology | 0.76929 | 15.64693 | 7.57343 | 18.65225 | 3.36386 | |

Heart | 0.08336 | 5.1217 | 4.26367 | 3.94698 | 0.19342 | |

Iris | 2.95875 | 0.02369 | 0.0293 | 3.57822 | 0.21908 | |

Spambase | 31.26761 | 10.24416 | 33.89317 | 0.26046 | 0.16618 | |

Tae | 0.01591 | 3.8189 | 0.01948 | 3.70213 | 0.01689 | |

Waweform | 2.92785 | 15.15752 | 3.34036 | 16.15447 | 2.87025 | |

WDBC | 0.02410 | 2.41815 | 0.00587 | 2.83812 | 0.05349 | |

Ecoli | 0.28630 | 6.49862 | 0.16336 | 8.88006 | 0.38829 | |

Glass | 0.16483 | 6.62155 | 0.32037 | 6.4242 | 0.52104 | |

Yeast | 0.58698 | 28.00517 | 2.14368 | 12.03848 | 1.11802 | |

Wine | 0.01133 | 2.91238 | 0.00254 | 4.10052 | 0.10899 | |

Vertebral | 3.11417 | 0.02677 | 0.0136 | 3.82931 | 0.09941 | |

Ionosphere | 2.98972 | 0.01699 | 0.01212 | 3.3906 | 0.06444 |

We compare PNTS3FCM and CS3FCM on 15 datasets using computational time.

METHOD | PNTS3FCM | CS3FCM | ||
---|---|---|---|---|

Ave | STD Dev | Ave | STD Dev | |

Australian | 0.01027 | 0.32286 | 0.02052 | |

Balance-scale | 0.00931 | 1.20235 | 0.00396 | |

Dermatology | 0.11784 | 1.38068 | 0.05733 | |

Heart | 0.06533 | 0.00100 | 0.00315 | |

Iris | 0.02343 | 0.02542 | 0.00312 | |

Spambase | 0.18252 | 3.16697 | 0.87895 | |

Tae | 0.04673 | 0.00202 | 0.00480 | |

Waweform | 0.09443 | 8.08507 | 4.26694 | |

WDBC | 0.00334 | 0.39131 | 0.00786 | |

Ecoli | 1.71820 | 0.14015 | 0.06711 | |

Glass | 0.57863 | 0.02675 | 0.01662 | |

Yeast | 11.35089 | 0.41673 | 0.80203 | |

Wine | 0.05700 | 0.00268 | 0.00182 | |

Vertebral | 0.00255 | 0.18393 | 0.00592 | |

Ionosphere | 0.00606 | 0.45496 | 0.01373 |

This research suggested a novel technique called Picture-Neutrosophic Trusted Safe Semi-Supervised Fuzzy Clustering (PNTS3FCM) to address the issue of data clustering with high confidence and noisy information. PNTS3FCM is constructed based on combining Picture Fuzzy Sets, Neutrosophic Sets and safe fuzzy semi-supervised clustering (PFS). This method consists of 4 critical parts: the clustering portion, the outlier solution part and the safe semi-supervised fuzzy clustering with labeled and unlabeled data. Through the use of PFS, the first stage of PNTS3FCM aims to reduce the distance between data components and cluster centers. The model’s second step involves processing the “noisy data” by integrating the entropy quantity between the neutral and refuse degrees. The third and fourth stages coordinate the safe semi-supervised fuzzy clustering using both labeled and unlabeled data to solve the safety information. We also provide an iterative technique from the formulation to construct the cluster centers and memberships. The method produces final clusters that are reliable and confident.

PNTS3FCM has illustrated its effectiveness by comparing it with two related methods, including FC-PFS and CS3FCM algorithm. The experiment results show that PNTS3FCM is better than the others in terms of computational time and clustering quality. Even though the proposed PNTS3FCM mainly focuses on eliminating or reducing noisy data elements, this method still has some limitations. First of all, PNTS3FCM takes a long time to compute. Secondly, it needs an increased number of parameters. In the future, an effective optimization algorithm will be studied and introduced to overcome these limitations.

We are grateful for the support from the staff of the Institute of Information Technology, Vietnam Academy of Science and Technology.

This research is funded by Graduate

The authors declare that they have no conflicts of interest to report regarding the present study.