^{#}Contributed equally to this work

Prostate cancer (PCa) is the second most common cancer and the main disease affecting men’s health worldwide (

Non-negative matrix factorization (NMF) is an unsupervised learning technique with the advantages of reducing noise and interpretability (

The tumor microenvironment includes tumor cells and surrounding nontumor components, including cancer-associated fibroblasts, endothelial cells, immune cells, the extracellular matrix, and a hypoxic glycolysis condition (

However, classification and construction of a high-risk and low-risk groups for the glycolysis-related genes based on the NMF method has been rarely adopted. According to the above reasons, we classified two clusters of PCa patients by NMF based on glycolysis-related genes. Through further analysis of the two clusters, we found interesting differences in survival probability and biological process between the two clusters. Then we established a prognostic model and a nomogram based on the DEGs between the clusters.

We downloaded RNA sequences (fragments per kilobase million, FPKM), clinical information, follow-up data, Gleason score, mutation data, etc., of PCa patients from the TCGA database. The samples lacking of critical clinical information were excluded. Glycolysis-related genes were obtained from MSigDB (

The mRNA data and clinical information of the PCa patients were downloaded from TCGA (

To differentiate subclasses based on glycolysis-related genes, NMF was performed for the glycolysis-related gene expression profiles by the R package “NMF.” A suitable number of clusters was necessary for steady and available clusters (

Kaplan–Meier analysis was performed, and a curve was created; in our analysis, the event of interest was set to death. After Kaplan–Meier curves were created, log-rank tests were performed. The

After classification and Kaplan–Meier analysis, we focused on exploring the correlation between clusters and clinical features. The clinical features included in the analysis were T and N stages (from the TNM stage), Gleason score, race, and age. The Gleason stage is a widely used grading method for PCa (

To examine the cause of significant differences in survival probability between the two clusters, we investigated the difference in immune cell infiltration between the clusters. To quantify the infiltration of immune cells, the Microenvironment Cell Populations-counter method (

To analyze the mutation of the two clusters, we downloaded the mutation data of the PCa in varscan format in TCGA. The R package “maftools” is a powerful tool to analyze somatic mutations developed by Mayakonda (

Differential genes between two clusters were filtered by “limma” with |log2FC| > 0.585, fold change = 1.5, and false discovery rate (FDR) < 0.05. To filter prognostic genes from differential genes, we first performed univariable Cox regression. The criterion was set to a

The independent parameters in PCa patients were analyzed, including risk scores and clinical characteristics by univariable and multivariable Cox regression. The parameters in Cox regression with

Kaplan–Meier analysis and receiver operating characteristic curve (ROC) analysis were used to assess each cohort and verify the feasibility and prognostic value of the prognostic model. To perform the Kaplan–Meier analysis, the TCGA cohort was randomly divided into training cohorts and test cohorts. GSE116918 was the external validation cohort. To distinguish the high-risk group from the low-risk group, the criteria were set to the median risk score. Then, log-rank tests were performed to verify the significant difference between the high-risk and low-risk cohorts. After Kaplan–Meier analysis and creation of curves, ROC analysis was performed for each cohort. One of the vital indicators for ROC analysis was the area under the ROC curve (AUC). We calculated the AUC at 1, 3, and 5 years and also investigated the performance of the prognostic model in patients with pathological scores <8 and ≥8. The above analyses were based on the R packages “survival” and “time ROC.”

The continuous variables were compared by Student’s

The gene sets downloaded from MSigDB were summarized and deduplicated to filter the glycolysis-related genes. GO and KEGG enrichment analyses were performed for the genes. In GO analysis, one of the ontologies, namely, biological process (BP), showed the function of the genes enriched in glycolysis-related processes, such as pyruvate metabolism and carbohydrate catabolism. Moreover, KEGG analysis showed that the genes were enriched in glycolysis/gluconeogenesis (

After the correlation test, we obtained the control samples and cancer samples in TCGA and GEO to extract the expression matrices of glycolysis-related genes. A total of 485 cancer samples were obtained from TCGA, and the GSE116918 dataset was downloaded from GEO. The expression matrices of glycolysis genes were extracted from the expression profiles from TCGA and GSE116918. We finally obtained 272 glycolysis-related genes and their expression profiles. These genes were used in the NMF analysis (

Kaplan-Meier analysis was performed and visualized (

A heatmap was created to exhibit the correlation between clusters and clinical features, including T and N stage (from TNM stage), Gleason score, race, and age (

To explore the difference in immune cell infiltration between the two clusters, we compared the degrees of infiltration of different immune cells, including B lineage, CD8 T cells, cytotoxic lymphocytes, endothelial cells, fibroblasts, monocytic lineage, myeloid dendritic cells, neutrophils, natural killer (NK) cells, and T cells (

In the mutation analysis, 118 of 200 samples were found to have a mutation in cluster 1 (59%), and 128 of 264 samples were found in cluster 2 (48.48%). The samples in cluster 1 had the most mutations in the

Univariable Cox regression, LASSO analysis, and multivariable Cox regression were used to filter the prognostic genes (

Based on 15 glycolysis-related prognostic genes, we established a glycolysis-related prognostic model for PCa. The risk score of the prognostic models was calculated, and the calculation method of the risk score was as follows: risk score = (0.480*LAMPS exp.) + (1.591*SPRN exp.) + (0.505*ATOH1 exp.) + (1.197*TANC1 exp.) + (0.250*ETV1 exp.) + (-0.390*TDRD1 exp.) + (0.298*KLK14 exp.) + (0.673*MESP2 exp.) + (0.841*POSTN exp.) + (1.047*CRIP2 exp.) + (-0.419*NAT1 exp.) + (1.635*AKR7A3 exp.) + (0.953*PODXL exp.) + (0.544*CARTPT exp.) + (0.801*PCDHGB2 exp.).

The TCGA samples were categorized into training and test groups. To verify the performance of predicting survival, Kaplan–Meier analysis and ROC analysis were performed. There were four different groups, including the TCGA all group (total samples from TCGA), the TCGA training group, the TCGA test group, and an external data validation group of samples from GEO (GEO group) (

The result was that the

We constructed a nomogram in the TCGA training group to quantitatively estimate the survival probability for PCa patients in the clinical setting. Univariable and multivariable Cox regression for risk scores and clinical characteristics were performed to screen independent prognostic factors. In the univariable Cox regression, Gleason score (

Despite the low mortality of patients with PCa, one-third of men, after treatment, experience relapse, and advanced PCa may finally progress to castration-resistant disease (

There has been some literature supporting the influence of glycolysis-related genes on the prognosis of patients with PCa (

In our model,

During the process of establishing the prognostic model, we found a significant difference in the probability between the clusters. Therefore, we investigated biological differences. The TP53 protein is commonly regarded as a tumor suppressor protein (

Immune cell infiltration plays a complex biological role in the development and progression of PCa (

Our study was the first research for classifying PCa by NMF based on glycolysis-related genes and then established a prognostic model with good performance. The results of a series of validations, including internal TCGA and external GEO validation, confirmed that the 15 genes based on glycolysis and NMF analysis have a good prognosis potential (the 1-year AUC of the TCGA all group was 0.892). It can provide a solution to the prognosis for PCa patients and help patients choose appropriate treatment options. Besides, the glycolysis-related genes were proved to be associated with the prognosis of the PCa patients in this study and also provide new understandings of the pathogenesis of PCa. However, our study had a few shortcomings and limitations. All of the datasets used for training and testing were from TCGA and GEO; the prognostic performance of the model should be further verified.

^{8+}Foxp

^{3+}regulatory T cells mediate immunosuppression in prostate cancer