An 8-ferroptosis-related genes signature from Bronchoalveolar Lavage Fluid for prognosis in patients with idiopathic pulmonary fibrosis

Background With the rapid advances of genetic and genomic technologies, the pathophysiological mechanisms of idiopathic pulmonary fibrosis (IPF) were gradually becoming clear, however, the prognosis of IPF was still poor. This study aimed to systematically explore the ferroptosis-related genes model associated with prognosis in IPF patients. Methods Datasets were collected from the Gene Expression Omnibus (GEO). The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was applied to create a multi-gene predicted model from patients with IPF in the Freiburg cohort of the GSE70866 dataset. The Siena cohort and the Leuven cohort were used for validation. Results Nineteen differentially expressed genes (DEGs) between the patients with IPF and control were associated with poor prognosis based on the univariate Cox regression analysis (all P < 0.05). According to the median value of the risk score derived from an 8-ferroptosis-related genes signature, the three cohorts’ patients were stratified into two risk groups. Prognosis of high-risk group (high risk score) was significantly poorer compared with low-risk group in the three cohorts. According to multivariate Cox regression analyses, the risk score was an independently predictor for poor prognosis in the three cohorts. Receiver operating characteristic (ROC) curve analysis and decision curve analysis (DCA) confirmed the signature's predictive value in the three cohorts. According to functional analysis, inflammation- and immune-related pathways and biological process could participate in the progression of IPF. Conclusions These results imply that the 8-ferroptosis-related genes signature in the bronchoalveolar lavage samples might be an effective model to predict the poor prognosis of IPF. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-021-01799-7.


Introduction
Characterized by fibrosis or structural deformations, idiopathic pulmonary fibrosis (IPF) is a chronic and progressive interstitial lung disease of unknown etiology [1,2]. In the United States, incidence of IPF in people aged 18-64 years was 6.1 new cases per 100,000 person-years [3]. Currently, pirfenidone and nintedanib have been approved by the United States Food and Drug Administration to treat patients with IPF [4,5]. However, the prognosis of IPF is still poor, the median survival time is usually 2-3 years after diagnosis [6,7], and the 5-year survival rate is less than 40% [8]. Therefore, it is very important to explore the novel prognostic models associated with prognosis in patients with IPF.
As a kind of special biological sample, bronchoalveolar lavage fluid (BALF) contains the accumulation of extravasated inflammatory cells and cytokines reflecting the microenvironment around the alveoli, and studies have verified that changes in the alveolar microenvironment are correlated with the progression of IPF [9]. A study has constructed a 4-genes signature of bronchoalveolar lavage cells, which may be an effective prognostic tool to deliver more personalized treatment decisions for patients with IPF [10]. Ferroptosis is a newly described form of caspase-independent regulated cell death (RCD) characterized by cellular accumulation of reactive oxygen species (ROS) driven through iron-dependent lipid [11,12]. Studies have found that ferroptosis was associated with the fibrosis of many organs such as liver, heart, and  lung, and the ROS accumulation and oxidative stress may be the primary inducer of ferroptosis in this process [13][14][15][16][17]. In addition, iron overload may cause lung fibrosis according to increased lipid peroxidation and decreased glutathione peroxidase 4 (GPX4) activity in lung tissues [18]. However, whether the ferroptosis-related genes are associated with prognosis of patients with IPF is unclear. Furthermore, the systematic exploration of the prognostic signature based on ferroptosis-related genes between the patients with IPF and subjects in the control group is also absent. Therefore, according to the dataset from the Gene Expression Omnibus (GEO), the study aimed to systematically explore the prognostic value of an 8-ferroptosisrelated genes signature in patients with IPF. Finally, we explored the possible mechanisms based on functional enrichment analysis.

Acquisition of data
The workflow of this study was shown in Fig. 1. On the GEO database (http:// www. ncbi. nlm. nih. gov/ geo/), we selected datasets must meet the following items: (1) the detected samples came from the bronchoalveolar lavage (BAL) of patients with IPF or control group; (2) raw data or a gene expression matrix should be provided; (3) the dataset must contain information regarding the prognostic status. Finally, one dataset, GSE70866 was identified (platform: GPL14550 and GPL17077). IPF diagnosis in this dataset was established by a multidisciplinary board at each institution based on the American Thoracic Society/European Respiratory Society criteria [19,20] and was confirmed to be consistent with guidelines published in 2011 [2]. Approval of the Ethics Committee was not required because the information of patients was obtained from the GEO.

Construction and validation of a prognostic ferroptosis-related gene signature
This study included a discovery cohort consisted of 20 normal people and 62 patients from Freiburg, Germany, and two independent validation cohorts: Siena, Italy (50 patients) and Leuven, Belgium (64 patients, Table 1). The differentially expressed genes (DEGs) between patients with IPF and controls were identified by R package "limma" with a false discovery rate (FDR) < 0.05 in the Freiburg cohort [21]. Cox regression analysis was performed for prognostic value of the ferroptosis-related genes according to R package "survival" (https:// github. com/ thern eau/ survi val, v.3.2-7) or SPSS Statistics 23 (IBM SPSS). Heatmap was constructed according to R packages "pheatmap" (v1.0.12). An interaction network for the ferroptosis-related DEGs with significant prognostic value was generated by the STRING database (version 11.0) [22] and visualized by Cytoscape (a software platform for visualizing complex networks, version v3.6.1). In order to minimize the risk of overfitting, a prognostic model was constructed by LASSO-penalized (least absolute shrinkage and selection operator) Cox regression analysis according to the R package "glmnet" (v.4.1-1) [23,24]. The independent variable in the Cox regression was the expression matrix of candidate ferroptosis-related DEGs with significant prognostic value, and the response variables were survival status of patients in the Freiburg cohort. Following the minimum criteria, penalty parameter (λ) for the model was determined by ten-fold cross-validation (the value of λ corresponding to the lowest partial likelihood deviance). The risk scores of the patients were calculated based on each gene's expression level and its corresponding regression coefficient as follows: score = sum (each gene's expression × corresponding coefficient). The patients were stratified into high-risk and low-risk groups according to the median value of the risk score. Based on the expression of genes in the predicted model, principal component analysis (PCA) was performed in the GraphPad Prism 9. Besides, t-distributed stochastic neighbor embedding (t-SNE) were carried out to seek the distribution of different groups using the "Rtsne" R package (v.0.15). The optimal cut-off expression value of each gene of the model was determined for the survival analysis by the "surv_cutpoint" function of the R package "survminer" (https:// cran.r-proje ct. org/ web/ packa ges/ survm iner/ index. html, v.0.4.9). The time-dependent ROC curve was constructed to evaluate the predictive value of the risk score according to the R package "survivalROC" (https:// CRAN.R-proje ct. org/ packa ge= survi valRO C,v.1. 0.3).
Decision curve analysis (DCA) was performed to evaluate whether the prognostic model could help improve clinical decision-making. The y-axis represents the net benefit, and the x-axis indicates the probability of the threshold value. By using "stdca.R" (http:// www. decis ioncu rvean alysis. org/) [25], DCA was used to determine the clinical usefulness of the signature.

Statistical analysis
R software (Version 4.0.3) and SPSS Statistics 23 (IBM SPSS) was used for statistical analysis. When the data were normally distributed, means for continuous variables were compared according to independent group t tests [described as mean ± SD (standard deviation)]; otherwise, the Mann-Whitney test was used (data were described as median [interquartile range [IQR])]. Categorical variables were described as number (%) and were compared by Chi-square test or the Fisher exact test. Kaplan-Meier analysis with the log-rank test was used to compare the survival between different groups. Univariate and multivariate Cox regression was used to estimate the hazard ratio (HR) to identify predictors of mortality. Some statistical analyses were visualized by GraphPad Prism 9. Bilateral test (the test level α = 0.05) was used.

Identification of prognostic ferroptosis-related DEGs in the Freiburg cohort
Of the 293 ferroptosis-related genes, 51 (17.4%) were differentially expressed between patients with IPF and controls, and 19 of them were correlated with prognosis according to the univariate Cox regression analysis (Fig. 2a-c). The interaction network among the 19 genes was showed in Fig. 2d.

Construction of a prognostic model in the Freiburg cohort
A prognostic model was established by LASSO Cox regression analysis using the expression profile of the 19 genes mentioned above. An 8-ferroptosis-related genes signature was constructed based on the optimal value of λ (Additional file 1: Figure S1), and the survival analyses of the 8 genes according to the optimal cut-off expression value of each gene were showed in the Additional file 2: Figure S2. The risk score was calculated as follows: 0.447056157 * expression level of NRAS + 0.008853087 * expression level of EMP1 + 0.283483044 * expression level of SLC40A1 + 0.308932641 * expression level of MYC + 0.191156392 * expression level of ANGPTL4 + 0.312166561 * expression level of PRKCA + 0.072692152 * expression level of MUC1-0.200150039 * expression level of GABARAPL1. According to the median cut-off value, the patients were stratified into a high-risk group (n = 31) and a low-risk group (n = 31) (Fig. 3a). The risk score was significantly positively correlated with GAP score and mortality in the Freiburg cohort (Table 2). PCA and t-SNE analysis demonstrated the patients in different risk groups were distributed in two directions (Fig. 3b, c). Patients in the high-risk group had a higher probability of death earlier than those in the low-risk group (Fig. 3d, e). Time-dependent ROC curves was used to evaluated the predictive performance of the risk score for mortality, and the area under the curve (AUC) reached 0.869 at 1 year, 0.845 at 2 years, 0.83 at 3 years and 0.936 at 5, 7 years (Fig. 3f ).
The decision curves of the prognostic model in the Freiburg cohort were shown in Additional file 3: Figure  S3a. The black solid line indicates no intervention, and the net benefit is zero. The gray solid line represents the intervention, at which the net benefit is an oblique line with a negative slope. The red dotted line represents the realized profits of the 8-ferroptosis-related genes model. The decision curve showed that, at threshold values of 0.2-0.5, the greatest clinical benefit will be obtained from the prognostic model with 8-ferroptosis-related genes.

Validation of the signature in the Siena and Leuven cohort
Survival analyses of the 8 genes in the ferroptosis-related signature showed that these genes were associated with poor prognosis in the Siena cohort and the Leuven cohort (all P < 0.05, Additional files 4, 5: Figures S4, S5). To test the robustness of the signature constructed from the Freiburg cohort, the patients from the Siena and Leuven cohorts were also divided into high-or low-risk groups respectively according to the median value of risk score calculated with the same formula in the Freiburg cohort (Figs. 4a, 5a, respectively). The patients in the high-risk group were also associated with higher mortality in the Siena cohort and Leuven cohort, respectively ( Table 2). PCA and t-SNE analysis demonstrated that patients in two subgroups were distributed in discrete directions (Figs. 4b, c, 5b, c). Consistently, high-risk patients were more likely to encounter death earlier and had higher mortality compared with low-risk patients (Figs. 4d, e, 5d, e). In addition, the AUC of the 8-genes signature was 0.815 at 1 year, 0.849 at 2 years, and 0.750 at 3 years in Siena cohort (Fig. 4f ), and, the AUC of the 8-gene signature was 0.813 at 1 year, 0.806 at 2 years, 0.835 at 3 years and 0.632 at 5 years in Leuven cohort (Fig. 5f ). Furthermore, The DCA showed that the risk score was more likely to have better clinical benefit than GAP (Additional file 3: Figure S3b-c).     Fig. 6).

Functional analyses in the Freiburg and the Siena cohort
In order to reveal the underlying biological functions and pathways that were correlated with the risk score, GO enrichment and KEGG pathway analyses were used to perform the DEGs between the high-risk and low-risk groups in the Freiburg and the Siena cohort. DEGs were enriched in inflammation-and immune-related GO and KEGG pathways such as receptor ligand activity, signaling receptor activator activity, cytokine activity, G protein-coupled receptor binding, CCR chemokine receptor binding, cytokine-cytokine receptor interaction, and chemokine receptor binding (Fig. 7) etc. As to Leuven cohort, the number of differential genes obtained under the same criteria was too small to perform functional enrichment analysis, so it was excluded.
To further explore the potential association between immune status and the risk score, ssGSEA was applied to quantify the enrichment scores of diverse immune cell subpopulations, related functions or pathways in the three cohorts. The scores of APC (antigen presenting cell) co-stimulation, parainflammation and inflammatory response were significantly higher in the patients of highrisk group compared with patients of low-risk group in the three cohorts (Fig. 8).

The signature for the diagnosis of IPF
Compared with controls, risk score was significantly higher in the patients with IPF (P < 0.001, Additional file 6: Figure S6a). Based on the ROC curve analysis, the optimal cut-off value (9.148) of risk factor might have potential values in helping clinicians to identify patients with IPF, with a sensitivity of 90.3% and a specificity of 85.0% (AUC: 0.925, 95% CI 0.861-0.989; P < 0.001, Additional file 6: Figure S6b). According to ssGSEA, the dysfunction of immune response may be observed in the patients with IPF (Additional file 6: Figure S6c, d).

Discussion
IPF is a chronic and serious lung disease. Lung tissuebased molecular genomic models [29,30] have been found to be correlated with prognosis in patients with IPF. However, the procedure of lung biopsy is dangerous, which limits the applicability of such genomic signatures. The BAL cells gather on the outer surface of the alveoli and are usually collected by bronchoscope, and their gene expression is predictive of survival in IPF [9]. Therefore, the construction of the multi-gene signature of BAL cells is very important for the prediction of prognosis in patients with IPF. In this study, 51 ferroptosis-related DEGs were identified in IPF samples compared to normal controls in the Freiburg cohort from GSE70866 dataset, and 19 DEGs of them were associated with poor prognosis of IPF. We constructed a novel prognostic model integrating 8 ferroptosis-related DEGs and validated it in two external cohorts. In addition, the ROC curve confirmed the predictive value of risk score for prognosis. Functional analyses showed that inflammation-and immunerelated pathways were enriched. Furthermore, according to DCA, using risk score or combination of risk score and GAP got more benefit compared with GAP only, suggesting that the 8-ferroptosis-related genes model can improve making clinical decisions to some extent.
Although several previous studies [15,31] have suggested that a few genes might regulate ferroptosis in IPF, their relevance to survival in IPF patients is still largely unknown. Oxidative stress is thought to be involved in the development of alveolar damage, inflammation and fibrosis [32]. It has been found that in patients with IPF, lipid peroxidation and DNA oxidation are increased while antioxidant markers such as glutathione and Haem oxygenase (HO)-1 are reduced [12,17]. We speculated that an imbalance of oxidants and antioxidants in the organism leads to increased production of ROS and altered iron homeostasis, which triggers ferroptosis characterized by iron-accumulation and participates in the progress of IPF. In this study, the 8-ferroptosis-related genes signature was associated not only with the diagnosis but also with the prognosis of IPF, which suggested that ferroptosis might participate the development and progression of IPF. The prognostic model proposed in this study was consisted of 8 ferroptosis-related genes (NRAS, EMP1, SLC40A1, MYC, ANGPTL4, PRKCA, MUC1, GABARAPL1). In the current study, seven of the genes (NRAS, EMP1, SLC40A1, MYC, ANGPTL4, PRKCA, MUC1) in the prognostic model were up-regulated, while GABARAPL1 was down-regulated. Furthermore, these genes could be roughly divided into four categories, including inflammation or immune response (NRAS, PRKCA, MYC, SLC40A1), protein binding (GABARAPL1, EMP1), p53 binding (MUC1), angiogenesis (ANGPTL4) according to DAVID (http:// david. abcc. ncifc rf. gov/) [33]. In terms of inflammation or immune response, the activation of PRKCA is involved in a calcium-mediated process participating in cystic fibrosis [34]. NRAS encodes GTPases involved in cell growth, proliferation and differentiation, and its protein products lead to downstream signaling events [35,36]. MYC may promote the exacerbation of pulmonary fibrosis according to immune regulation [37,38], and be a key gene according to the interaction network. MYC produces c-myc proto-oncoprotein, which acts down-stream of multiple growth factor signaling pathways, and MYC amplification was significantly associated with squamous cell lung carcinoma (SCC) in IPF patients [38]. SLC40A1 encodes Ferroportin which has an important role in the hypoferremic response to inflammation [39,40]. P53 was considered to a novel regulatory factor in the process of ferroptosis with the dual effects on ferroptosis through transcriptional or post-translational mechanism [15,41]. MUC1 may participate in the ferroptosis regulation by influencing p53 expression. ANGPTL4 (an angiopoietinlike protein belonging to a superfamily of secreted proteins) is involved in angiogenesis, which could be related to the development of pulmonary fibrosis [42]. A study [43] found that EMP1 could mediate cell density-regulated ferroptosis. GABARAP family members are involved in autophagosome maturation, and compared to normal tissues, the reduced expression of GABARAPL1 has been reported in various cancer cell lines [44]. The link between these results and ours remains to be determined, which need further study for the verification.
Although the underlying mechanisms of pulmonary diseases susceptibility to ferroptosis have been an intense research area in the past few years, the potential regulatory mechanism between IPF immunity and ferroptosis is still elusive. Based on the DEGs between different risk groups, functional enrichment analysis was performed, and we discovered that the inflammation-and immunerelated pathways were enriched. Actually, inflammation-and immune-related pathways play vital role in the development of IPF, and immune processes can coordinate existing fibrotic responses [30,45]. Interestingly, in this study, there was a significant difference in the process of antigen presentation between the low-risk group and the high-risk group. One possible theory is that ferroptosis cells emit different signals to attract antigen-presenting cells (APCs) to the site where cells die of ferroptosis [46]. Importantly, the pathogenesis of IPF is accompanied by a mass proliferation of macrophages. By changing phenotypes, such as classic activated macrophages (M1) and alternative activated macrophages (M2), macrophages participate in the pathogenesis of IPF and maintain the homeostasis of the lung environment [47]. In the Freiburg cohort, significant differences in M1 macrophages and M2 macrophages were observed between the low-risk group and high-risk group, however, there were no significant difference in the Leuven cohort and Siena cohort, which may be caused by the differences of patients or groups. In addition, higher risk scores were associated with impaired immune function, including the activity of the type II IFN response and T cell exhaustion as well as the fractions of Treg cells. Therefore, weakened immunity of high-risk patients may be a reason for their poor prognosis.
There are some limitations in this study. First, the model was constructed and validated based on the retrospective data from GEO, and the samples of every cohort were relatively small. A larger-sample prospective studies are needed to test its clinical application. Second, the inherent disadvantage of building a prediction model by considering only a single feature is inevitable.
Many important prognostic genes in IPF might have been excluded. Third, clinical parameters such as lung function, treatment measures, underlying diseases and so on were absent in the datasets, thereby, the represent mean of the signature was limited. Forth, the treatment of patients with IPF was unknown in the three cohorts, which may affect outcome. Finally, it should be stressed that the relations between the risk score and immune activity have not yet been addressed in experiments.

Conclusion
In conclusion, this study established a novel prognostic signature of 8 ferroptosis-related genes. The risk score might be an effective model to predict the poor prognosis of IPF. In addition, the potential mechanisms between prognosis and inflammation-and immunerelated response in IPF remain poorly understood, which needs further investigation.  Table S1. The 293 ferroptosis-related genes used in the study.
Additional file 8: Table S2. The annotated gene set file used in ssGSEA.