Cluster features in fibrosing interstitial lung disease and associations with prognosis

Background Clustering is helpful in identifying subtypes in complex fibrosing interstitial lung disease (F-ILD) and associating them with prognosis at an early stage of the disease to improve treatment management. We aimed to identify associations between clinical characteristics and outcomes in patients with F-ILD. Methods Retrospectively, 575 out of 926 patients with F-ILD were eligible for analysis. Four clusters were identified based on baseline data using cluster analysis. The clinical characteristics and outcomes were compared among the groups. Results Cluster 1 was characterized by a high prevalence of comorbidities and hypoxemia at rest, with the worst lung function at baseline; Cluster 2 by young female patients with less or no smoking history; Cluster 3 by male patients with highest smoking history, the most noticeable signs of velcro crackles and clubbing of fingers, and the severe lung involvement on chest image; Cluster 4 by male patients with a high percentage of occupational or environmental exposure. Clusters 1 (median overall survival [OS] = 7.0 years) and 3 (OS = 5.9 years) had shorter OS than Clusters 2 (OS = not reached, Cluster 1: p < 0.001, Cluster 3: p < 0.001) and 4 (OS = not reached, Cluster 1: p = 0.004, Cluster 3: p < 0.001). Clusters 1 and 3 had a higher cumulative incidence of acute exacerbation than Clusters 2 (Cluster 1: p < 0.001, Cluster 3: p = 0.014) and 4 (Cluster 1: p < 0.001, Cluster 3: p = 0.006). Stratification by using clusters also independently predicted acute exacerbation (p < 0.001) and overall survival (p < 0.001). Conclusions The high degree of disease heterogeneity of F-ILD can be underscored by four clusters based on clinical characteristics, which may be helpful in predicting the risk of fibrosis progression, acute exacerbation and overall survival. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-023-02735-7.


Introduction
Fibrosing interstitial lung disease (F-ILD) encompasses a heterogeneous group of diseases of various etiology, including idiopathic pulmonary fibrosis (IPF), connective tissue disease-associated interstitial lung disease (CTD-ILD), idiopathic non-specific interstitial pneumonia (iNSIP), fibrotic hypersensitivity pneumonitis (FHP); unclassifiable idiopathic interstitial pneumonia (uIIP), and occupational induced ILDs [1,2].Accurate diagnosis is essential for these diseases to management.However, a highly heterogeneous disease course often delays correct diagnosis and anti-fibrotic treatment in clinical practice.
Researchers have tried to find patients with irreversible rapidly progressive fibrosis, which may indicate higher mortality [3].The term "progressive fibrosing ILD (PF-ILD)" used in a hallmark clinical trial was proposed to cover several diseases featuring high-resolution computed tomography (HRCT)-documented increase in the extent of pulmonary fibrosis, a decline in lung function, worsening respiratory symptoms, and a high risk of early mortality despite available treatments, with a clinical course similar to that of IPF [1,2,4,5].Recently, the newly updated American Thoracic Society (ATS)/ European Respiratory Society (ERS)/Japanese Respiratory Society (JRS)/Asociación Latinoamericana de Tórax (ALAT) guideline suggests using the term "progressive pulmonary fibrosis (PPF)" instead.Still, the criteria of PF-ILD and PPF are different [6].It is difficult to predict the proportion of patients with non-IPF ILDs who will develop a progressive fibrotic pattern; however, the likely predictors at high risk of progression and mortality at baseline have been investigated, such as male sex, increasing age, smoking, and HRCT-documented usual interstitial pneumonia (UIP) pattern [2,3,5,[7][8][9].
Using the baseline clinical features, a new trend of clustering method was aimed to help reveal a novel subtype of F-ILD, which was associated with the prognosis and treatment and had distinct clinical features.Patients with similar characteristics are grouped in the same cluster so that each cluster will be the most homogeneous and heterogeneous compared to the other clusters [10].Previously, in chronic ILDs, clinical characteristics determined using cluster analysis illustrated considerable predictive accuracy for clinical outcomes, including progressionfree survival, transplant-free survival, and acute exacerbation (AE) [11].However, they didn't include patients with occupational or environmental related ILDs, though occupational and environmental dust exposure is essential for clustering.Besides, the possible relationship between clusters and fibrosis progression has not been explored.
To elucidate potential relationships between clusters and outcomes, including fibrosis progression, AE, and all-cause mortality, we aimed to employ cluster analysis on a patient cohort obtained from a regional tertiary referral center specializing in the management of ILDs.We also conducted a comparison of the clinical characteristics among patients who met different criteria for fibrosis progression.

Study population and design
Patients with F-ILD aged ≥ 18 years between January 1, 2016, and January 1, 2021, from Beijing Chao-Yang Hospital, China, were retrospectively screened for the study.The diagnoses were based on multidisciplinary discussion, following ATS/ ERS guidelines [12,13].Patients enrolled in the study were those whose baseline medical records, pulmonary function tests, and HRCT findings were accessible, and who had at least two documented follow-up visits with follow-up intervals of less than one year.Following were the exclusion criteria: (1) < 10% fibrosis documented on baseline HRCT, (2) diagnosis of pulmonary embolism, decompensated heart failure, lower respiratory tract infections, or acute respiratory distress syndrome at baseline, (3) lung cancer at baseline, (4) missing available baseline data, (5) loss to followup (Supplemental Material Figure S1).The study was approved by the Ethics Committee of Beijing Chao-Yang Hospital (2020-KY-437) and performed by the principles of the Declaration of Helsinki.

Data collection
The electronic medical record was reviewed to extract pertinent data at the first visit, including age, sex, body mass index (BMI), symptoms, signs, smoking history, occupational and environmental exposures based on each patient's reported history, comorbidities (pulmonary hypertension [defined by the right heart catheterization demonstrating a mean pulmonary artery pressure greater than 20 mmHg], diabetes, coronary heart disease, hypertension, hypothyroidism, gastroesophageal reflux disease [GERD]).All patients provided laboratory test results, pulmonary function measures [forced vital capacity (FVC), percent predicted for FVC (FVC% pred), and percent predicted for diffusion capacity of the lung for carbon monoxide (DLCO% pred)] and HRCT images.Laboratory data included arterial blood gas test results, antinuclear antibody (ANA) titers, routine blood counts, the derivative blood cell count inflammation indexes (neutrophil-to-lymphocyte ratio, monocyteto-lymphocyte ratio, platelet-to-lymphocyte ratio), and serum oncomarkers (squamous cell carcinoma [SCC] antigen, carcinoembryonic antigen [CEA], cytokeratin fraction 21-1[CYFRA21-1], carbohydrate antigen 125 [CA125], and neuron-specific enolase [NSE]).Hypoxemia was defined as a partial pressure of oxygen in the arterial blood of less than 80 mm Hg obtained from an arterial blood gas test at rest.Two radiologists blinded to the clinical data independently reviewed the HRCT scans.The disagreements were resolved via consensus.An HRCT-documented UIP-like pattern was considered a definitive UIP or probable UIP pattern according to the Clinical Practice Guideline of IPF [6,12].
We reviewed each case to gather treatment options.The choice of treatment depended on the decision of physicians and the use of each treatment was documented if it was ever used for at least one month.The treatment regimens for patients consisted of corticosteroids, both in monotherapy and in combination with immunosuppressive drugs (cyclophosphamide, mycophenolate mofetil, cyclosporine, azathioprine or tacrolimus), antifibrotic therapy (nintedanib or pirfenidone), and longterm oxygen therapy (at least 15 h per day).Each patient's Gender-Age-Physiology (GAP) score was calculated and assigned to their respective GAP stage.The GAP index was preferred over the ILD-GAP index, as some patients had ILD diagnoses that did not align with the ILD categories in the ILD-GAP index [14,15].

Follow-up and outcomes of the study
Follow-up data were obtained from outpatient followup records (usually every 3-6 months), hospitalization details, and telephone interviews and the follow-up period ended on January 1, 2022.The outcome of the study was the (1) occurrence of fibrosis progression, (2) occurrence of AE, and (3) all-cause mortality.AE was defined as an acute, clinically significant respiratory deterioration that occurred in less than one month and was accompanied by new radiologic abnormalities on HRCT, such as diffuse, bilateral ground-glass opacification, with no obvious clinical cause, such as fluid overload, left heart failure, or pulmonary embolism [16,17].Overall survival (OS) was measured from the first visit to death from any cause.

Fibrosis progression
Both two different definitions explain fibrosis progression in the cohort.Firstly, PPF was defined as at least two of three criteria (worsening symptoms, radiological progression, and physiological progression) occurring within 12 months (Criteria A) [6].Physiological progression was defined as either of the following: (1) absolute decline in FVC% pred > 5%; (2) absolute decline in DLCO% pred > 10% within 12 months of follow-up [6].Notably, our cohort included patients with IPF.Secondly, Patients meeting any of the following criteria within 24 months period have experienced fibrosis progression (Criteria B): (a) a relative decline of ≥ 10% in FVC% pred; (b) a relative decline of ≥ 15% in DLCO% pred; (c) worsening symptoms or a worsening radiological appearance accompanied by a ≥ 5-<10% relative decrease in FVC% pred [1].

Statistical analysis
Statistical analyses were performed using the SPSS Statistics software, version 26 (IBM, Inc., Chicago, IL, USA).After excluding those with missing baseline data, the complete data of 575 patients were used for cluster analysis.Based on previous literature, 17 variables were identified with substantial clinical relevance for inclusion in the "two-step" cluster analysis [1,11,18].These variables were as follows: male, age, BMI, heavy smoking, exposure history, hypoxemia, signs [velcro crackles, clubbing of fingers], lung function [FVC, FVC% pred, DLCO% pred], HRCT features [UIP-like pattern, emphysema], ANA titers and comorbidities [pulmonary hypertension, diabetes, coronary heart disease]).The choice of a similarity measure was based on the log-likelihood distance.
To determine which number of clusters is "best", each of the cluster solutions was compared using Schwarz's Bayesian Criterion as the clustering criterion.The optimal number of four clusters was automatically determined.The silhouette coefficient was 0.5 and the cluster quality was fair.When we specified the number at three or five, the silhouette coefficient was lower than 0.5.The discriminant function analysis was performed to validate the cluster analysis results, and 93.7% of original grouped cases were correctly classified.
Data were expressed as means (standard deviation, SD) or medians (quartile) depending on distribution and numbers (percentage), respectively.Mann-Whitney U, Kruskal-Wallis test, or One-way ANOVA was used for continuous variables and the chi-squared test or Fisher's exact test for categorical variables with Bonferroni posthoc tests to compare the difference between every two groups.Survival curves were obtained using the Kaplan-Meier method.Survival was assessed using unadjusted log-rank testing and univariate and multivariable Cox proportional hazards regression.All statistical tests were 2-sided, and a p-value < 0.05 was considered statistically significant.
To assess baseline disease severity among the clusters, we evaluated the GAP score (Table 1).The median GAP score was similar among Cluster 1, Cluster 3 and Cluster 4 (GAP = 3).Cluster 2 had the lowest median score compared with others (GAP = 2, p < 0.001).
When evaluating AE and OS according to univariate Cox regression analysis, predictors included phenotypic clusters (p < 0.001), diagnosis category (p < 0.001) and GAP score (p < 0.001).After adjusting for diagnosis category, corticosteroid use, immunosuppressive therapy and anti-fibrotic treatment, multivariable analysis also suggested that phenotypic clusters (p < 0.001) and GAP score (p < 0.001) independently predicted OS and AE (Table 4).

Discussion
In this retrospective cohort study, a heterogeneous group of patients with F-LID was grouped into four clusters according to clinical and comorbidities variables present before clinical diagnosis.After forming the clusters, we compared the clinical features, laboratory data, and clinical outcomes between clusters and tried to identify the group of patients with the poorest prognosis.
Cluster 1, in which 53.6% of the patients had autoimmune features, included a group of gender equally distributed patients with a higher prevalence of some common comorbidities in aging people and a higher prevalence of hypoxemia at rest, and the lung function measures at baseline were worst among four clusters.Like Cluster 1, 54.3% of patients in Cluster 2 had autoimmune features.It featured mainly young female patients with less or no smoking history.Cluster 3 included predominantly male patients with the highest rate of smoking history.The signs of velcro crackles and clubbing fingers, and the UIP pattern were presented most.Besides, we identified a cluster of patients (Cluster 4) with the highest percentage of occupational or dust exposure history.The median age for this group with the highest proportion of male patients was younger, and the clinical signs and UIP-like fibrotic pattern and emphysema on HRCT were presented least.Regarding the prognosis, Clusters 1 and 3 had a higher risk for AE, and OS was shorter in the two Clusters, consistent with clinical practice.The prevalence of fibrosis progression was also higher in Clusters 1 and 3, despite the different definitions.
Cluster analysis is a valuable approach to investigating patients with respiratory diseases [19,20], such as asthma [21], chronic obstructive pulmonary disease [22], idiopathic pulmonary arterial hypertension [23], sarcoidosis [24], and IPF [25], as it allows the identification of distinct characteristics among the heterogeneous disease courses.A cluster analysis of patients with idiopathic interstitial pneumonia and emphysema concluded that the cluster consisting mostly of IPF patients had significantly poor mortality [26].Another cluster analysis was performed with a cohort of 770 chronic ILD patients (37% were diagnosed with IPF) [11].Patients with the most rapid decline in lung function, increased fibrosis progression, and poor survival were usually clustered together.And a group of elderly white male smokers with severe honeycombing had poor mortality, similar to Cluster 3 in our study.However, the study did not include patients with ILD of known etiology, of which occupational exposure is an essential factor.Our clustering was of clinical feasibility and practice in exploratory inclusion of occupational induced ILD patients.Patients in Cluster 4 in our study was characterized by non-UIP pattern on HRCT and the highest frequency of occupational exposure history.The primary underlying diagnosis for this cluster was occupational related ILDs.Generally, patients with occupational-related ILDs progress slowly, which may account for the better OS observed in this cluster [27,28].Conversely, Cluster 1, characterized by a high prevalence of CTD-ILD, displayed a poorer survival outcome.One plausible explanation for this observation is that patients in this cluster may have been influenced by baseline lower lung function and comorbid conditions, such as diabetes, a known risk factor for the development of IPF and vascular complications [29].
Patients with a rapid decline in lung function, increased extent of pulmonary fibrosis on chest images and worsened symptoms showed a higher risk of AE and poorer survival [2,9].Previously, a clinical trial reporting a beneficial effect of antifibrotic treatment in non-IPF ILDs with fibrosis progression suggested criteria B as the definition for fibrosis progression [1,4,30].Recently, updated Criteria A was published [6].Here, we adopted both two measures.In total, 26.1% of patients fulfilled Criteria A while 41.6% the Criteria B. No statistically significant difference was found in clinical features.However, the percentage of patients with AE and mortality was higher in criteria A than B, suggesting the possible sensitivity of the newly updated definition for poor prognosis.
Given the similar clinical features and prognosis of patients with PPF and IPF, the acceptance of antifibrotic treatment for IPF led to investigation of such treatment in other fibrotic lung diseases [2,8,31,32].Previously, clinical trials such as INBUILD [4], RELIEF [32] and pirfenidone on uIIP [33] reported a beneficial effect of antifibrotic treatment in non-IPF ILDs with PPF.The annual rate of decline in the FVC in patients with PF-ILD was significantly lower among patients who received nintedanib (-80.8 mL/year) than among those who received a placebo (-187.8mL/year) [4].In this study, OS was shorter in patients with anti-fibrotic treatment than in non-anti-fibrotic therapy in the whole cohort.After cluster and criteria stratification, no difference was found in subgroups (Figure S2).It is not unexpected that patients who received anti-fibrotic therapy would have higher mortality because the medication varied due to the compliance, socio-economic situation and medical insurance of the patients and the doctors' prescriptions in this retrospective cohort [34].The treatment choices could be clustered as factors in a prospective cohort in further research.
This study has several strengths.In contrast to prior studies [11,21], we specifically included occupational induced ILD of known etiology and assessed the presence of fibrosis progression in various clusters.For occupationally related exposures such as chronic silicosis and asbestosis, the majority of patients tend to experience slow progression [27,28].Next, our exploration involving the inclusion of PF-ILD/PPF patients and comparison of patients meeting PF-ILD criteria to those meeting PPF criteria represented a novel and pioneering endeavor, provided valuable additional findings.Using cluster analysis, comprehensive information including demographics, physiological values, chest imaging, laboratory values and complications was provided in this study.
Several limitations should be considered.First, the monocentric and retrospective design with inherent limitations may have led to a selection and information recall bias, and the exclusion of cases with missing values during the clustering analysis undoubtedly introduces selection bias.The aims of this study and the explanations of the results should be interpreted with caution.Secondly, though we employed clinical variables that could represent patients' demographic, historical, physical, laboratory, and radiographic information, the numbers and optimal combination of variables used for the clustering needed to be validated.A previous study has indicated that other variables like race and GERD could contribute to clustering [11], which would have provided important information regarding the outcomes of ILD.However, this was a single ethnic study and the presence of GERD was not recorded reliably in our medical system.GERD was sometimes diagnosed due to proton pump inhibitors used by patients to be eligible for government health insurance coverage.Furthermore, we compared clinical features of chronic silicosis and asbestosis patients, considering potential variations in disease progression due to distinct exposures.Our findings indicated no statistically significant differences in prognosis between these groups (Table S6 and Figure S4).Given the limited sample size

Conclusion
Using baseline clinical features, our study identified four clusters of the heterogeneous group of F-ILD patients.
The four clinical clusters differed in expressions of laboratory data, risks of fibrosis progression, AE, and survival, highlighting the significant heterogeneity between clusters and homogeneity within clusters and may give a clue to predict clinical prognosis and develop management strategies for these patients.Further studies are warranted to optimize the cluster variables and clinical classifications of F-ILD with prospective and multicenter study designs.

Table 1
Demographics and clinical characteristics of the patients in four clusters

Table 3
Clinical outcomes of the patients in four clusters

Table 4
Variables predicting acute exacerbation and overall survival : adjusted for phenotypic cluster, ILD subtype and corticosteroid use, immune-suppressive agents and anti-fibrotic treatment and disease categories when stratifying disease types, a comprehensive exposure dataset is anticipated.
a: Reference category: Cluster 2 b: Reference category: IPF c