Ferroptosis is a newly discovered form of cell death characterized by iron-dependent lipid peroxidation. This study aims to investigate the potential correlation between ferroptosis and the prognosis of lung adenocarcinoma (LUAD).
RNA-seq data were collected from the LUAD dataset of The Cancer Genome Atlas (TCGA) database. Based on ferroptosis-related genes, differentially expressed genes (DEGs) between LUAD and paracancerous specimens were identified. The univariate Cox regression analysis was performed to screen key genes associated with the prognosis of LUAD. LUAD patients were divided into the training set and validation set. Then, we screened out key genes and built a prognostic prediction model involving 5 genes using the least absolute shrinkage and selection operator (LASSO) regression with tenfold cross-validation and the multivariate Cox regression analysis. After dividing LUAD patients based on the median level of risk score as cut-off value, the generated prognostic prediction model was validated in the validation set. Moreover, we analyzed the somatic mutations, and estimated the scores of immune infiltration in the high-risk and low-risk groups. Functional enrichment analysis of DEGs was performed as well.
High-risk scores indicated the worse prognosis of LUAD. The maximum area under curve (AUC) of the training set and the validation set in this study was 0.7 and 0.69, respectively. Moreover, we integrated the age, gender, and tumor stage to construct the composite nomogram. The charts indicated that the AUC of LUAD cases with the survival time of 1, 3 and 5 years was 0.698, 0.71 and 0.73, respectively. In addition, the mutation frequency of LUAD patients in the high-risk group was significantly higher than that in the low-risk group. Simultaneously, DEGs were mainly enriched in ferroptosis-related pathways by analyzing the functional results.
This study constructs a novel LUAD prognosis prediction model involving 5 ferroptosis-related genes, which can be used as a promising tool for decision-making of clinical therapeutic strategies of LUAD.
The ferroptosis is an iron-dependent form of regulated cell death (RCD) that has been recently discovered. It differs from the apoptosis, necrosis, and autophagy . The implementation of ferroptosis requires the activation of the following three ferroptosis features: The oxidation of phospholipids containing polyunsaturated fatty acids (PUFA); The presence of redox active iron; and the loss of lipid peroxide repair abilities . With the in-depth analysis of ferroptosis, the induction of ferroptosis has been identified as a vital event involved in pathological progressions, including human tumors. Preliminary evidences have suggested the regulatory effect of ferroptosis on the growth of various types of cancers like renal cancer, pancreatic cancer, non-small-cell lung cancer (NSCLC) and diffuse large B-cell lymphoma . The ferroptosis has been identified to suppress tumor growth and the progression, and as a result, the induction of ferroptosis has emerged as a promising anti-cancer treatment .
Lung cancer is the most prevalent one among malignancies and it is the chief leading cause of tumor-related deaths worldwide. Pathologically categorized, about 85% of lung cancer cases belong to NSCLC, of which lung adenocarcinoma (LUAD) is one of the most frequent subtypes . With the wide-spreading application of targeted drugs and immune checkpoint inhibitors, therapeutic options of patients with LUAD have revolutionarily changed. However, the prognosis of metastatic or recurrent LUAD is still far away from satisfying . Besides, the overall survival of lung cancer patients significantly varies across the world, with a 5-year survival of 21.2% in the United States, which can be higher than that in China . Recent studies have reported that up-regulation of the GSH synthesis pathway in NSCLC cells can suppress ferroptosis ,9. NFS1, as a ferroptosis-related gene, is detected highly expressed in LUAD cells . In addition, ferroptosis is also correlated to the prognosis of renal carcinoma and hepatocellular carcinoma ,12. We therefore speculated whether ferroptosis is correlated to the prognosis of LUAD, and the possible involvement of ferroptosis-related genes.
RNA sequencing data of ferroptosis-related genes and clinical information of LUAD patients were downloaded from the public databases. It is shown that expression levels of ferroptosis-related genes were correlated to survival outcomes of LUAD. In the present study, LUAD patients were divided into a training set and a validation set based on the random stratified sampling of tumor stages. Then, we established the multi-gene LUAD prognosis prediction model and calculated risk scores through the LASSO regression with tenfold cross-validation, and univariate and multivariate Cox regression analyses. Finally, the established model was verified in the validation set and the overall sample set, aiming to test the fitting degree of the model. To explore the underlying molecular mechanism of the difference in the prognosis of LUAD, we further performed immune and biological functional enrichment analyses.
Materials and methods
Data acquisition and preliminary processing
LUAD is a frequently detected subtype of NSCLC. LUAD dataset obtained from the TCGA database (https://portal.gdc.cancer.gov/) involved 533 cancer specimens and 59 paracancerous ones. Their raw RNA-Seq data, single-nucleotide variation (SNV) data and clinical information were included as well. The mRNA expression data were normalized using the DESeq2 variance stabilizing transformation (VST). Based on the previous research, we selected the top 60 ferroptosis-related genes as the candidate gene set (Additional file 5: Table S1) [4, 13,14,15].
Analysis of DEGs and model establishment
The DESeq2 package in R  was utilized to analyze DEGs between LUAD specimens and paracancerous ones based on the false discovery rate (FDR) < 0.05. The Cox proportional-hazards model is a type of semiparametric regression model, in which survival outcomes and survival time are considered as the dependent variables, and the impact of multiple factors on survival time is analyzed. The most crucial concept in the model is hazard ratio (HR). Generally speaking, HR > 1 and HR < 1 indicate a risk and protective prognostic factor in cancer dataset, respectively. The values in the matrix of VST were utilized to the following univariate Cox regression analysis that assessed potential influences of ferroptosis-related genes on the overall survival of LUAD. P values were adjusted with the Benjamini & Hochberg (BH) procedure. Genes with HR ≠ 1 and p < 0.05 were selected as prognosis-related genes. Subsequently, the intersection set between the identified DEGs and prognosis-related genes was taken, and their downstream functions in influencing the prognosis of LUAD were further analyzed.
The glmnet package in R was utilized to perform LASSO regression analysis on the target gene set  and the establishment of the multi-gene prognostic prediction model. To avoid over-fitting and obtain reliable results, we applied the tenfold cross-validation to acquire optimal lambda values from the minimum partial likelihood deviance, and screened out representative genes. A few candidates were selected to establish a multivariate Cox regression model . Using the coxph function, the PH test of each factor was performed, and the VIF and correlation coefficient of each factor in each regression model were calculated as well. Meanwhile, the possible collinearity of factors was determined. Variables that conformed to the PH hypothesis and collinearity tests were re-modeled. Risk scores of LUAD patients were calculated according to the modeling results. The formula was established as follows: Risk score = Sum (expression level of each gene × corresponding coefficient). Expression level of each gene was the normalized mRNA expression, and the coefficient was the result of the multivariate Cox regression analysis.
To explore the survival predictive feasibility of risk scores obtained from the multi-gene prognostic prediction model, we used the "survival" and "survminer" packages and the Kaplan–Meier method to estimate the survival curve. The time-dependent ROC curve was plotted by the "survival ROC" package in G, which graphically displayed the predictive ability in the prognosis of LUAD at different time points.
Construction of the prognostic composite nomogram and its verification
To establish a more reliable prediction method that could be applied in clinical practice, we constructed the composite nomogram through the rms package. Similarly, clinical features and risk scores of LUAD patients were combined to establish a multivariate Cox regression model. Based on the influenced degrees of risk factors on the outcome, they were graded to value its level, and a total score was obtained by the sum of the score of each factor. Finally, through the converse relation between the total scores and the outcome events, the predicted survival time of LUAD patients was calculated.
We further verified the accuracy of the nomogram by applying the model using the C-statistic, calibration curve and time-dependent ROC curve. Meanwhile, the nomogram model was calibrated to predict the proximity between the predicted survival and the actual result numerically. The Hosmer–Lemeshow (H–L) test was performed by dividing data into 3 ascending ordered groups (tertiles) based on the predicted result obtained from the model, thus testing the goodness of fit for the χ2 test. Furthermore, the average predicted survival was compared with the actual event rate estimated using the Kaplan–Meier method.
The enrichment analysis of DEGs function and the enrichment score of immune gene set
DEGs between high-risk groups and low-risk groups divided by risk scores were obtained. Subsequently, enrichment analysis of gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs were analyzed using the clusterprofiler package in R .
The single-sample gene set enrichment analysis (ssGSEA) on 29 immune gene sets  involving 16 immune cells and 13 immune-related activation pathways was performed using the gsva package in R  (Additional file 6: Table S2). The degree of immune infiltration of each sample was recorded.
The sample function in R was used to perform the random stratified sampling. Meanwhile, the heatmap of DEGs and SNV mutation landscape were respectively depicted by the pheatmap and maftools packages in R . Wilcoxon-test was used to test the difference between groups of the boxplot. Kaplan–Meier survival curve was depicted for assessing the prognostic potential, followed by the log-rank test to compare the difference between curves. The R software (Version 3.6.0) was used for all statistical analyses. The flowchart of this study was shown in Fig. 1.
Identifying prognosis- related DEGs that were associated with ferroptosis
The downloaded LUAD dataset contained 533 LUAD specimens and 59 paracancerous ones. Among them, 502 LUAD cases had clinical information of the overall survival (Table 1). Besides, the DESeq2 package was used to analyze the mRNA expression data of the original counts for obtaining DEGs (FDR < 0.05). The acquired DEGs were sorted according to the value of log2 fold change, and they were displayed by depicting the heatmap (Fig. 2a). Univariate Cox regression analysis revealed that among the 60 ferroptosis-related genes, 11 were significantly correlated to the prognosis of LUAD (adj p < 0.05). Taking the intersection of the 11 genes and DEGs, it is obviously shown that the former ones were differentially expressed between LUAD and paracancerous specimens. In addition, the results of Cox regression analysis were depicted in the forest plots (Fig. 2b).
The multi-gene prognostic model in the training set
We randomly stratified LUAD patients according to their tumor stage and divided them into the training set (n = 251) and validation set (n = 251). Meanwhile, LASSO-Cox regression analysis of the above-mentioned set was performed to identify prognosis-related genes that were correlated to ferroptosis (Fig. 3a, b). The results revealed that a total of 5 genes conformed to the pH-partition hypothesis (Additional file 1: Figure S1) and the collinearity test (correlation coefficient less than 0.5) (Fig. 3c), namely the ACSL4, GSS, ACSL3, PEBP1 and PGD genes. Through plotting the forest plots of them, it is disclosed that the ACSL4 (adj P < 0.05), GSS, ACSL3 and PGD genes were the risk factors for the prognosis of LUAD, while the PEBP1 gene was a protective factor (adj P < 0.05) (Fig. 3d). Furthermore, LUAD patients were subgrouped based on the expression level of the 5 identified genes, and the corresponding Kaplan–Meier survival curves were displayed in Fig. 4. Besides, the threshold of the expression level of each gene was calculated by the surv_cutpoint function from the survminer R package (Additional file 2: Figure S2). We subsequently calculated the risk score of each sample based on the modeling results using the following formula: Risk score = 0.2226 × expression level of ACSL4 + 0.2373 × expression level of GSS + 0.4369 × expression level of ACSL3—0.4417 × expression level of PEBP1 + 0.1431 × expression level of PGD. Expression level of each gene was normalized. LUAD patients were further divided into the high-risk group and low-risk group based on the individualized risk score. The grouped threshold was determined by the median value of the risk score of all LUAD patients in the training set, the median of which was 6.64. Kaplan–Meier survival curves based on the overall survival calculated using the KM algorithm have indicated that LUAD patients in the high-risk group had a worse survival than that of the low-risk group (p = 0.0041). Later, the predictive potential of the risk score model in LUAD was assessed by depicting the ROC curve, and the highest AUC was 0.7 (Fig. 5a).
Verification of the prognostic model
To determine the reliability of the established model and the predictive accuracy, results obtained from the prognostic model were validated in the verification set. Similarly, risk scores in the validation set, and subgroup classification of LUAD patients into the high-risk group and low-risk group with the same threshold (cut-off = 6.64) were conducted. Moreover, the survival status of LUAD patients in the validation set was estimated using the KM algorithm, followed by plotting the survival curve. It is consistently shown that the prognosis of LUAD patients in the high-risk group was significantly worse than that of the low-risk group (p = 0.00042), and the maximum AUC was 0.69 (Fig. 5b). Finally, we combined the training and validation set, calculated the risk score of the 502 patients according to the similar models, and performed survival analysis and plotted the survival curves of the high-risk group (n = 257) and low-risk group (n = 245) using the same threshold (Fig. 5c). As expected, LUAD patients in the high-risk group presented a worse prognosis (p < 0.0001), and the maximum AUC was 0.69.
Construction and verification of the composite nomogram combining clinical information
To better apply the model to the actual clinical situation, clinical features of LUAD patients were introduced into the evaluations of the model and thus a composite nomogram to predict the survival probabilities of LUAD patients was constructed. By introducing the age, gender, tumor stage (I/II vs III/IV) and risk score as variables in the multivariate Cox regression analysis, and incorporating the above factors into the model as variables, a nomogram was obtained (Fig. 6a), in which, the line segment corresponding to each variable was marked with a scale that represented the value range of the variable, and the length of the line segment reflected the contribution of the factor to survival. The points in the nomogram represented the individual scores corresponding to each variable under different values and total points. The corresponding individual scores after introducing all the variables were summed to produce the total score. The last three lines represented the total score of 1-year, 3-year and 5-year survival rate. To verify the prognostic value of the nomogram model, the predictive ability of the nomogram was evaluated by tROC, and the AUC corresponding to 1-year, 3-year and 5-year survival was 0.698, 0.718 and 0.731, respectively (Fig. 6b). The prediction result of the C statistics on the nomogram model was 0.677 (95%CI, 0.633–0.721). The check chart indicated a good agreement between the predicted and actual results (Fig. 6c).
SNV mutation landscape variation between the high-risk group and low-risk group
We downloaded the SNV data of 502 LUAD patients, and subgrouped based on the risk score. Each figure of the mutation landscape indicated the top 20 genes with the highest mutation frequency, and their mutations types were labeled with different colors (Fig. 7). The upper and right bar graph represented the total number of mutations in each sample, and the mutation frequency of each gene in all samples. It is revealed that genes with a higher mutation frequency in the high-risk group were consistent with those in the low-risk group, while their mutation frequencies were higher in the high-risk group [TP53 (53%), TTN (50%), MUC16 (42%), CSMD3 (40%), and RYR2 (39%)].
The GO and KEGG enrichment analysis
To further explore the differences in biological functions and pathways between the high-risk group and low-risk group, the DESeq2 package was used to analyze expression levels of DEGs between the high-risk group and low-risk group (logFC > 1, FDR < 0.05). DEGs screened out from both groups were later subjected to GO and KEGG enrichment analysis. The results of pathway enrichment and GO enrichment were shown in Fig. 8, and Additional file 7, 8: Table S3–S4. It is shown multiple pathways were enriched in the metabolism. In detail, the mainly enriched pathways included the neuroactive ligand-receptor interaction, metabolism of xenobiotics by cytochrome P450, steroid hormone biosynthesis, staphylococcus aureus infection pathway, IL-17 signaling pathway, retinol metabolic pathway, etc. The results of GO enrichment showed that the biological functions and processes of DEGs were mainly enriched in the keratinization, axoneme assembly, and microtubule bundle formation. Enrichment results could support the reasons for the variations in the prognosis of LUAD from a functional level. Moreover, some pathways and functions were identified closely correlated to ferroptosis or iron metabolism, which provided references for further research.
The correlation between risk score and immune status in LUAD
According to the threshold of risk score, the correlation between risk score and immune status in LUAD patients was further explored. We obtained a total of 29 immune-related gene sets involving multiple immune cells and immune-related functions or pathways. To quantify the enrichment degree of transcriptome data in the 29 immune gene sets, separate enrichment scores for each paring of a sample between the high-risk risk and low-risk group were estimated by the ssGSEA and compared by depicting a boxplot (Fig. 9). The enrichment scores of concentrations of immune cell genes in aDCs, iDCs, B-cells, mast cells, neutrophils, NK cells, T helper cells, Th2 cells and TIL were significantly different between the high-risk group and low-risk group (p < 0.05). Besides, the enrichment scores of these immune function gene sets in HLA, inflammation promoting, MHC class I, T cell co-stimulation and type II IFN response were also significantly different between groups (p < 0.05).
The current study systematically identified the correlation between 60 ferroptosis-related genes and the prognosis (overall survival) of LUAD. In this study, the LUAD dataset was divided into the training set and validation set by the random stratified sampling method. The prognostic prediction model involving 5 genes was established for the training set samples through the LASSO regression with tenfold cross-validation and univariate and multivariate Cox regression analysis, which was validated in the actual clinical practice.
Previous evidences have confirmed the vital functions of ferroptosis-related genes in the entire ferroptosis process. Nevertheless, the specific influence of a single ferroptosis-related gene on the prognosis of a certain type of cancer remains unclear. We combined the mRNA expression levels of each gene with actual clinical characteristics to analyze the differential expression between LUAD and paracancerous specimens. Moreover, quantitative data of gene expressions were analyzed for their predictive potential in the survival of LUAD. Subgrouped by the risk score, identified genes between the high-risk group and low-risk group were analyzed for their differential variations, biological functions and pathways they were mainly enriched in. In conclusion, our screened DEGs in LUAD patients may be potential targets involved in the occurrence and development of LUAD.
In the constructed predictive model, a total of 5 genes (ACSL3, ACSL4, GSS, PEBP1, PGD) were involved in, which were closely associated with the process of ferroptosis. The ACSL3 gene is responsible for exogenous monounsaturated fatty acids to protect cells against ferroptosis, and it is negatively correlated with ferroptosis sensitivity . The ACSL4 gene is essential for proferroptosis. Knockdown of ACSL4 inhibits erastin-induced ferroptosis, and its overexpression can restore ferroptosis sensitization. Re-expression of flag-tagged human wild-type (WT) ACSL4 (ACSL4-Flag) in Acsl4 KO (Acsl4−/−) Pfa1 cells restores full sensitivity to ferroptosis induction, and knockdown of it significantly prolongs survival compared to vehicle-treated mice. Knockout of ACSL4 in ferroptosis-sensitive cells protects erastin- and RSL3-induced cell death , 25. The GSS gene provides instructions for making glutathione synthetase. The glutathione-dependent lipid hydroperoxidase GPX4 contributes to prevent ferroptosis by converting lipid hydroperoxides into non-toxic lipid alcohols. Overexpression of PEBP1 increases the sensitivity of HK2 cells to RSL3, and knockdown of PEBP1 in HAEC and HT22 cells yields an opposite result . The PGD gene is involved in erastin-induced ferroptosis .We modeled and calculated the corresponding risk score based on the expression data of 5 candidate genes, and then divided LUAD patients into high-risk group and low-risk group. Our established model effectively predicted the survival of LUAD patients in the training set, the validation set and the total cases. Meanwhile, the gender, age, and tumor stage of LUAD patients were taken into consideration, and thus a composite nomogram was established, which was much closer to the actual clinical practice. Among them, the gender and age of LUAD patients had relatively a small effect on the prognosis, whereas the tumor staging and risk score posed a greater one. The above results were consistent with our investigation expectations.
To explore the factors for the prognosis difference between the high-risk group and low-risk group, we compared the SNV levels and analyzed the differences in biological functions of the two groups. After subgrouping the acquired SNV statistics according to high-risk and low-risk scores, the mutation frequency of each gene and the number of mutations in each sample were calculated, which were displayed as the mutation landscape. The results revealed that LUAD patients in the high-risk group presented a higher frequency of each gene mutation. Moreover, the top 5 genes with the highest mutation frequency in the high-risk group, involving the TP53 (53%), TTN (50%), MUC16 (42%), CSMD3 (40%) and RYR2 (39%) genes presented 5–13% higher mutation frequency than those in the low-risk group. Calculating the mutation frequency easily identified the source of the variations between groups. Genomic analyses about the prognosis difference between high-risk and low-risk LUAD patients need to be performed in the future.
Functional enrichment analysis and immune infiltration score on DEGs between the two groups were further performed. DEGs were mainly enriched in the cytochrome P450, steroid hormones, IL-17 signaling pathways, staphylococcus aureus infection pathways, etc. The cytochrome P450 oxidoreductase (POR) is necessary for triggering the ferroptosis in cancer cells . Moreover, POR boosts the execution of ferroptosis by engaging in the peroxidative modification of phospholipids in the cell membrane. PPARα, as a member of the steroid hormone receptor superfamily, can inhibit iron overload and lead to ferroptosis by combining with the GPX4 and facilitating its expression. The preliminary evaluations indicated that the steroid hormones in adrenal sebaceous cells are significantly affected by ferroptosis induction . Iron plays a crucial role in the continuous evolution process for staphylococcus aureus by establishing efficient iron transportation systems. On the one hand, staphylococcus aureus consumes hemoglobin in the red blood cells of the host and serum through the transport systems encoded by the iron-regulated surface determinants located in cell wall. On the other hand, it acquires the iron by the siderophores with a high affinity for iron. A previous study has shown that the reactive oxygen species can result in the drug resistance in staphylococcus aureus infection . Previous GO enrichment revealed that biological functions like microtubule bundle formation and keratinization are mainly enriched in the iron [30, 31]. Therefore, we concluded that the variations in these pathways might be the inherent factors for the differences between the high-risk group and low-risk group. Our findings provide theoretical references for underlying the mechanism of ferroptosis in lung cancer patients.
Although constructing a prognostic model is of great significance in the TCGA-LUAD cohort, only internal verification and overall verification were performed in the verifying process of the model. Specifically, the external validation set was not included in the examination of the final results. In functional analysis, enrichment analysis was performed in DEGs between the high-risk group and low-risk group, while functions of these pathways were not be experimentally verified. As a result, we only identified which pathways and functions were responsible for triggering the prognosis difference in high-risk and low-risk LUAD patients, but how they induced it needs to be further explored.
In general, we established a prognostic prediction model based on 5 ferroptosis-related genes by analyzing the LUAD dataset. In the training and validation set, this model was found effectively predict the overall survival of LUAD, and more important, clinical features of LUAD patients were taken into consideration, which remarkably simulated the actual clinical practice. Our findings provided a promising tool in predicting the prognosis of LUAD patients, and theoretical references for explaining the prognosis difference between high-risk and low-risk LUAD.
This study constructed a novel LUAD prognosis prediction model based on 5 ferroptosis-related genes, which can provide a reliable prognostic evaluation tool for clinical practice and assist the clinical therapeutic decision.
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
Dixon SJ, Lemberg KM, Lamprecht MR, et al. Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell. 2012;149(5):1060–72.
Santarpia M, Aguilar A, Chaib I, et al. Non-small-cell lung cancer signaling pathways, metabolism, and PD-1/PD-L1 antibodies. Cancers (Basel). 2020;12(6).
Allemani C, Matsuda T, Di Carlo V, et al. Global surveillance of trends in cancer survival 2000–14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries. Lancet. 2018;391(10125):1023–75.
. Pathway enrichment results of DEGs between high risk group and low risk group.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.