- Open Access
DNA methylation molecular subtypes for prognosis prediction in lung adenocarcinoma
BMC Pulmonary Medicine volume 22, Article number: 133 (2022)
Lung cancer is one of the main results in tumor-related mortality. Methylation differences reflect critical biological features of the etiology of LUAD and affect prognosis.
In the present study, we constructed a prediction prognostic model integrating various DNA methylation used high-throughput omics data for improved prognostic evaluation.
Overall 21,120 methylation sites were identified in the training dataset. Overall, 237 promoter genes were identified by genomic annotation of 205 CpG loci. We used Akakike Information Criteria (AIC) to obtain the validity of data fitting, but to prevent overfitting. After AIC clustering, specific methylation sites of cg19224164 and cg22085335 were left. Prognostic analysis showed a significant difference among the two groups (P = 0.017). In particular, the hypermethylated group had a poor prognosis, suggesting that these methylation sites may be a marker of prognosis.
The model might help in the identification of unknown biomarkers in predicting patient prognosis in LUAD.
Lung cancer is one of the main results in tumor-related mortality, and in China, it ranks first and the second highest cause of cancer morbidity among men and women, respectively [1, 2]. Some of the associated factors causing increased lung cancer mortality include increased tobacco use, aging, and atmospheric pollution. In 2018, lung cancer was predicted to have caused 1.8 million deaths [3,4,5]. The general subtype of lung cancer is lung adenocarcinoma (LUAD) and due to its late recognized, the five-year survival rate is reported to be 15% . LUAD is sensitive to chemotherapy, however, the rapid increase in drug resistance and chemo-resistance result in the death of most patients [7, 8]. For advanced-stage LUAD, molecularly targeted therapies are reported to increase the patients' survival rates. however, many more patients failed to have a useful targetable mutation. LUAD usually initiates from abnormal hyperplasia of the bronchial mucosa, continued by malignant infiltration and growth. Epigenetic alteration is closely associate to tumorigenesis, growth, and metastasis, but DNA methylation raised forward constantly in the complex course of tumorigenesis and acts as key part in regulating gene function in tumor cells. Methylation differences reflect critical biological features of the etiology of LUAD and affect prognosis . In this way, it is necessary for recognizing the more useful novel specific epigenetic targets for developing powerful prognostic evaluation, survival, and timely launch of therapeutic drugs and treatment ways of LUAD.
DNA methylation characterized to the response catalyzed through DNA methyltransferase (DNMT), whereby a methyl of S-adenosylmethionine (SAM) is transferred to the cytosine form five-methylcytosine (5-mC). It mainly arises in the C-phosphate-G site (CpGs) and over two thirds of mammalian CpGs are methylated . While, un-methylated CpGs tend to aggregate in the seed sequence of structural gene promoters as well as transcription start sites (TSS), forming CpG islands . CpG islands exist in the 5′ regulatory area of most genes, especially in the promoter region. The high level of 5-methylcytosine in the gene promoter area can lead to gene silencing in tumor cells, and then suppresses gene expression, leading to cell dysfunction. Chromosomal instability, particularly 17p loss, offer proof for the accumulation of mutations and propose that cancerous regions are helpful for the selection and expansion of these precancerous lesions in LUAD . Currently, methylation of some promoter sequences, involving EGFR, KRAS, TP53, ECT2, S100A16, and AGTR1 has been related to the incident and development of LUAD [13,14,15]. But, the value of these gene methylations in clinical not well examined in LUAD patients. Besides, there is currently no systematic assessment of predictions of overall survival (OS) or characteristics involve in DNA methylation in LUAD. In the present study, we built a prediction prognostic signature integrating DNA methylation used high-throughput omics data for improved prognostic evaluation.
Materials and methods
We downloaded RNA-Seq standardized FPKM data and clinical comparison data from 486 cases in the TCGA-LUAD (www. portal.gdc.cancer.gov/). Methylation data from Illumina Infinium was got from the UCSC Cancer Browse (www. genome.ucsc.edu/) and human methylation was performed on 27 and 450 bead chip arrays in cases from 150 and 503 patients, respectively. The level of each methylation site is expressed as a β value, ranging from1 (fully methylated) to 0 (unmethylated). CpG loci for which data were lacking over 70% of the subjects were rule out. The cross-reactive genome CpG sites in "Illumina Infinium Human Methylated 450-bead Array Discovery of Cross Reaction Probe and Polymorphic CpG" were not included. The k-nearest neighbor (KNN) estimator is applied for the remaining sites where input data is unable available. Combat algorithms in the R SVA package eliminate the batching effect. Unsteady genomic sites, involving CpG and individual nucleotide polymorphisms on sex chromosomes, were eliminated. Because gene expression regulated promoter DNA methylation, we specifically studied the promoter region CpGs (from transcription start site 2 kb upstream to 0.5 kb downstream). In addition, we choose cases with available gene expression profiles. Overall, 458 subjects and 21,120 methyl sites were contained. Samples are divided into two groups: a testing set (27 Beadchip) and a training set (450 Beadchip).
Identification of classification features
CpG loci that significantly affected survival were applied as classification features. Univariate Cox regression model was established based on the methylation level, TNM, age, sex, stage and survival information. Significant CpGs got from the univariate Cox regression model were included in the multivariate Cox regression model. Then CpG loci were selected as characteristic CpG loci. The risk score was acquired based on the formula: esum (CpG's expression×coefficient). The way of prediction accuracy of the prognostic signatures used by ROC curve.
Identification of molecular subtypes
ConsensusClusterPlus (R package) was used for consensus clustering, and the LUAD subgroups were identified according to the CpG sites that varied the most. The algorithm first subsamples some items and features from the data matrix, and each subsample is divided into group K by kmeans. The "consensus" clustering is defined by calculating the stability of the clustering results from using a specific clustering way to a random data subset. The area under the curve with no obvious change was taken as the classification count. To bring more thorough classification categories for LUAD, more categories are tended to be used. Use color gradients to act for consensus values from zero (white) to one (dark blue); arrange the matrices so that items being classified the alike cluster are next to each other.
Prognosis and Function analyses
Kaplan–Meier diagrams are applied to demonstrate OS in the LUAD subgroup defined by DNA methylation profiles. The significance of the differences among clusters evaluate by log-rank test. The relationship between clinical, biological characteristics and DNA methylation clusters analyzed by chi-square test. KEGG and Genetic Ontology (GO) (Molecular Functions (MF), Biological Processes (BP), and Cellular Components (CC)) for the analysis of biological functions and annotation genes using the R ggplot2 package to display the graph. All tests were bilateral, and P < 0.05 was considered statistically significant.
Built the prognostic methylation sites signatures
A flow chart of this study was shown in Fig. 1. In the training dataset, overall 21,120 methylation sites were identified. The univariate Cox regression identified 1103 CpG sites as prospective DNA methylation biomarkers for OS. Cox regression analysis of the 864 methylation sites with tumor stage, sex, TMN, and age as covariates identified 205 independent prognosis-related CpG sites. The clinicopathological properties of the samples are shown in Table 1. Middle-aged at diagnosis was 60.5 years, and the median age of final exposure of the study subjects was 9.3 years.
Identify DNA methylation subgroups
A consensus cluster of the 205 possible prognostic methylation sites was applied to identified molecular subgroups of DNA methylation. Under the region of the Cumulative Distribution Function (CDF) curve as well as a consensus, matrix to decide the numbers of clusters. The CDF curve began to stabilize afterward cluster 6 (Fig. 2A, B). To increase the prognostic worth of the LUAD subgroups, we choose even more cluster number when possible. The consensus matrix indicates the consensus for k = seven illustrates a well-determined seven-block pattern Fig. 2C. The corresponding heat map marked with TNM, gender, stage, age and DNA methylation subgroup in the tree diagram of Fig. 2C is shown in Fig. 2D. OS analysis illustrated that there was a statistically significant difference in prognosis among the seven groups (P < 0.05). Cluster 6 had the worst prognosis, but cluster three and seven had the best prognosis (Fig. 3A). Then, we analyzed the clustering proportions of the seven clusters based on TNM, stage, gender and age Fig. 3B–G. The relationship between characteristics and specific clusters is: Clusters 1, 4 and 5 with lower T-level; Group 5 and group 6 were advanced group; Clusters 1 and 7 of lower N rank; Class M and above Group 3 and 6; Group 4 was older, group 1 had more women, and group 3 had more men. These results suggest that each clinical factor is related to different intra-cluster rates.
Identifying various feature of DNA methylation clustering
A total of 237 promoter genes were identified by genomic annotation of 205 CpG loci. Then we used the software package "Clusterprofiler" R to perform functional enrichment analysis on the 237 genes. BP associated pathways were mainly enriched in the regulation of animal organ morphogenesis, meiotic nuclear division, cell differentiation, and some metabolic process. MF associated pathways were significantly enriched in protein/factor binding, bridging, kinase/receptor-ligand activity. CC associated pathways were mainly enriched in the ribosome, postsynaptic specialization membrane, and other membrane regions. KEGG pathway was mainly enriched in metabolism pathways, Platinum drug resistance, Antifolate resistance, Apoptosis, ECM-receptor interaction, TNF and Ras signaling pathway (Fig. 4). We then analyzed the expression of the methylated genes identified in the subgroup, and the heatmap of gene expression is shown in Fig. 5A. Gene expression models varied among the subgroups an indication that the level of DNA methylation reflected the expression of the genes. A protein–protein interaction network was constructed and four hub genes (CCL25, PRMS17, NETO1, and RAD1) were identified using the MCODE of Cytoscape as shown in Fig. 5B. We then explored for cluster-specific methylation sites by using the methylation sites as cluster characteristics. To |log2FC|> 1 joint P value < 0.05 as selection criteria, will be one of the clusters as a single cluster, six other clustering as different, the difference between 7 clustering analysis. Cluster 6 had the highest specific sites, most of which were hypermethylated, and the level of methylation was the highest among all clusters (Fig. 3H).
Constructing and verifying the signature
Cluster 6 was chosen as the seed cluster as it had the most specific methylation sites. Cluster 6 has 7 specific methylation sites, among which cg22085335 is hypomethylation site and the others are hypermethylation sites. We used Akakike Information Criteria (AIC) to obtain the validity of data fitting, but to prevent overfitting to the greatest extent. After AIC clustering, specific methylation sites of cg19224164 and cg22085335 were left. Therefore, our model formula is as follows: risk score = 2.71*cg19224164 + 2.75*cg22085335. Using the median risk score as the threshold, the samples were divided into two (high-risk vs. low-risk) groups. Prognostic analysis illustrated a significant difference among the two groups (P = 0.017), as shown in Fig. 6A. The samples were then sorted according to the risk score to determine whether methylation levels changed systematically as the risk score changed, as shown in Fig. 6B. In particular, the hypermethylated group had a poor prognosis, suggesting that these specific methylation sites may be a marker of prognosis. Area under the curve (AUC) is 0.643, indicating normal operation of the model Fig. 6C. The methylation levels at specific sites increased significantly as the risk score increased. At last, a test data set to predict patient prognostic outcomes. There was also a significant difference in prognosis between the two groups (Fig. 7A, P = 8.305E−05). The AUC of the test sample is 0.788, suggesting that the model runs well Fig. 7B. Since EGFR mutations are major driver gene mutations, we also found a positive correlated with the methylation status of cg19224164 and cg22085335 Additional file 1: Fig. S1. In addition, the impacts of risk scores on patient OS in different clinical subtypes were explored, and the results indicated that Female, Stage III–IV, and T1–2 subtypes were significantly correlated with the survival of a patient Additional file 2: Fig. S2. Further, the differences in somatic mutations between the low and high-risk groups was also explored and TTN was the most common mutated gene Additional file 3: Fig. S3. These findings are consistent with the results of the training data set, which proves the stability and accuracy of our model prediction.
Lung cancer is the most critical lead to cancer-related mortality worldwide, resulting in over one million deaths each year. Adenocarcinoma is the most prevailing histological subtype of non-small cell lung cancer. Smoking is the leading result to lung adenocarcinoma. But, as the number of smokers has decreased in many countries, the incidence of LUAD among non-smokers has raised. Even if the 5-year survival rate of LUAD has get better in recent years owing to advances in surgical treatment, radiation therapy, and chemotherapy, it is still dissatisfactory. In order to enhance the administration of LUAD, molecular definition-based methods usually do not demand huge tissue samples, which can increase patient volume and reduce unnecessary surgical step. DNA methylation plays an important role in epigenetic function by reducing the activity of DNA fragments and inhibiting gene transcription.
The use of DNA methylation markers can do us a better prognosis and predict therapy response, thereby extending patient survival. DNA methylation changes are an early event in tumorigenesis and are crucial in the regulation of gene expression in cancers. Therefore, in the early diagnosis of LUAD, epigenetic changes can be detected either independent or in combination with other traditional biomarkers . One study obtained eight probes corresponding to the characteristics of eight genes (AGTRL1, CTSE, EFNA2, ALDH1A3, BDKRB1, NFAM1, TMEM129, and SEMA4a) to predict survival in patients with early LUAD, but this study only included Asian and Caucasian populations . In another study, 6 differentially methylated genes (JDP2, PLG, SERPINA5, SEMG2, RFX5, and POLR3B) were identified to predict the prognosis of LUAD patients, but the study restricts in stage I patients . Abnormal methylated genes may serve as non-invasive biomarkers for diagnosis at early, treatment selection, response assessment and possible applications of novel therapies. We identified 205 independents prognostic CpG loci and 237 corresponding promoter genes. We also built a PPI network and determined four central genes (CCL25, PRMS17, NETO1, and RAD1). C–C motif chemokine ligand 25 (CCL25), belongs to the subfamily of small cytokine CC genes and the product of this gene binds to chemokine receptor CCR9. CCR9-CCL25 axis is reported to play a critical role in breast cancer (BC) cell survival and low chemotherapeutic effect of cisplatin primarily via PI3K/Akt dependent fashion . Progesterone receptor modulators (PRMs) constitute an interesting new hormone drug for BC treatment, and anti-proliferative effects of various PRMs have been reported . NETO1 regulates NMDAR and kainic acid receptor (KAR) to control synaptic transmission by acting as a helper protein for two types of ionic glutamate receptors in a synaptic-specific manner . The Rad9-Hus1-Rad1 protein complex is thought to respond to DNA destruction and play an indispensable role in the cell cycle . RAD9 inhibition can potentiate the cytotoxic reaction of chemotherapy on BC cells . Besides, mouse RAD1 deletion is reported to enhance sensitivity for skin tumor development probably by maintaining genomic integrity . Currently, there is no comprehensive study on the role of CCL25, PRMS17, NETO1, and RAD1 in LUAD and this study may provide key information to in-depth studies.
KEGG pathway enrichment analysis indicated that they were mainly enriched in metabolism pathways, Platinum drug resistance, Antifolate resistance, Apoptosis, ECM-receptor interaction, TNF/Ras signaling pathway. Stage II–IIIA LUAD patients generally accept platinum-based ACT after surgical resection, while just 4–15% survival advantage after adjuvant chemotherapy (ACT) has been observed . Van et al. constructed a 37-gene signature for identifying patients with longer and shorter survival after receiving platinum-based ACT and then determined them to non-responders and responders, respectively . Thus, we hypothesis that the 237 promotor genes act as a key role in Platinum drug resistance and need further research. Although methylation may essential in LUAD, specific methylation sequences in the promoter region affecting gene expression remain unclear. Besides, in a larger group of LUAD patients, the statistical and clinical significance of these gene methylation associated to prognosis needs to be demonstrated. In this study, we tried to develop a classification model integrating many DNA methylation biomarkers to evaluate the prognosis. This model can promote the determination of novel biomarkers, molecular subtype classification, and precise medical targets of diseases in LUAD. Meanwhile, the model can also help with prognosis prediction, diagnosis, and strategies of patients with difference epigenetic subtypes of LUAD.
The signatures might give DNA methylation alteration and offer potentially useful targets for cancer treatment and prediction therapy response. But, our signatures have to prove in further independent studies as well as predictive DNA methylation functional by experiments. This study has limitations. First, the results have not yet been validated in clinical samples. Second, these results do not offer precise clinical data as a result of the relatively small sample size of patients used. Finally, due to the limited data, we could not discuss the role of cg19224164 and cg22085335 and the role of tobacco and alcohol habit information in LUD. Although our study hopes to explore the possibility of establishing predictive models, it is still in its infancy and needs to be improved. Meanwhile, cg19224164 and cg22085335 not only may be a useful biomarker but also a potential therapeutic target in LUAD.
In summary, prognostic specific methylation sites were identified by TCGA database and other bioinformatics methods, and a prognostic prediction model was constructed for LUAD patients. The model can help identify novel biomarkers, predict prognosis, clinically diagnose and manage patients with different distinct subtypes of LUAD.
The Cancer Genome Atlas database
Youlden DR, Cramb SM, Baade PD. The International Epidemiology of Lung Cancer: geographical distribution and secular trends. J Thorac Oncol. 2008;3:819–31.
Tan WL, Jain A, Takano A, Newell EW, Iyer NG, Lim WT, et al. Novel therapeutic targets on the horizon for lung cancer. Lancet Oncol. 2016;17:e347–62.
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin. 2018;68(6):394–424.
She J, Yang P, Hong Q, Bai C. Lung cancer in China: challenges and interventions. Chest. 2013;143:1117–26.
Zheng M. Classification and pathology of lung cancer. Surg Oncol Clin N Am. 2016;25:447–68.
Chen W, Zheng R, Baade PD. Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66:115–32.
Murray N, Turrisi AT III. A review of first-line treatment for small-celllung cancer. J Thorac Oncol. 2006;1:270–8.
Sarvi S, Mackinnon AC, Avlonitis N. CD133 +cancer stem-likecells in small cell lung cancer are highly tumorigenic and chemoresis-tant but sensitive to a novel neuropeptide antagonist. Cancer Res. 2014;74:1554–65.
Bjaanæs MM, Fleischer T, Halvorsen AR. Genome-wide DNA methylation analyses in lung adenocarcinomas: association with EGFR, KRAS and TP53 mutation status, gene expression and prognosis. Mol Oncol. 2016;10(2):330–43.
Ehrlich M, Gama-Sosa MA, Hang LH. Amour and distribution of 5-metylcytoinine in human DNA form different types of tissues of cells. Nucleic Acids Res. 1982;10(8):2709–27221.
Gu Y, Zhang CW, Wang L, Zhao Y, Wang H, Ye Q, Gao S. Association analysis between body mass index and genomic DNA methylation across 15 major cancer types. J Cancer. 2018;9:2532–42.
Sivakumar S, San Lucas FA, Jakubek YA. Genomic landscape of allelic imbalance in premalignant atypical adenomatous hyperplasias of the lung. EBioMedicine. 2019;42:296–303.
Zhou S, Wang P, Su X. High ECT2 expression is an independent prognostic factor for poor overall survival and recurrence-free survival in non-small cell lung adenocarcinoma. PLoS ONE. 2017;12(10):e0187356.
De C, Linjie L, Chao L. Aberrant S100A16 expression might be an independent prognostic indicator of unfavorable survival in non-small cell lung adenocarcinoma. PLoS ONE. 2018;13(5):e0197402-2.
Chen R, Hong Q, Jiang J. AGTR1 promoter hypermethylation in lung squamous cell carcinoma but not in lung adenocarcinoma. Oncol Lett. 2017;6:66.
Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–50.
Kuo IY, Jen J, Hsu LH. A prognostic predictor panel with DNA methylation biomarkers for early-stage lung adenocarcinoma in Asian and Caucasian populations. J Biomed Sci. 2016;23(1):58.
Luo WM, Wang ZY, Zhang X. Identification of four differentially methylated genes as prognostic signatures for stage I lung adenocarcinoma. Cancer Cell Int. 2018;18(1):60.
Johnsonholiday C, Singh R, Johnson EL. CCR9-CCL25 interactions promote cisplatin resistance in breast cancer cell through Akt activation in a PI3K-dependent and FAK-independent fashion. World J Surg Oncol. 2011;9(1):46.
Klijn J. Progesterone antagonists and progesterone receptor modulators in the treatment of breast cancer. Steroids. 2000;65:66.
Tang M, Pelkey KA, Ng D. Neto1 is an auxiliary subunit of native synaptic kainate receptors. J Neurosci. 2011;31(27):10009–18.
Yamamoto M, Nishiuma T, Kobayashi K. Rad9 is upregulated and plays protective roles in an acute lung injury model. Biochem Biophys Res Commun. 2008;376(3):594.
Yun H, Shi R, Yang Q. Over expression of hRad9 protein correlates with reduced chemosensitivity in breast cancer with administration of neoadjuvant chemotherapy. Sci Rep. 2015;4(1):7548.
Han L, Hu Z, Liu Y. MouseRad1deletion enhances susceptibility for skin tumor development. Mol Cancer. 2010;9(1):67.
Wallerek S, Sorensen JB. Biomarkers for efficacy of adjuvant chemotherapy following complete resection in NSCLC stages I–IIIA. Eur Respir Rev. 2015;24:340–55.
Van Laar RK. Genomic signatures for predicting survival and adjuvant chemotherapy benefit in patients with non-small-cell lung cancer. BMC Med Genomics. 2012;5:30.
The work was supported by Natural Science Foundation of Zhejiang Province (LY20H290001) and Wenzhou Public Welfare Science and technology program (Y20180211).
Ethics approval and consent to participate
All methods were carried out in accordance with relevant guidelines and regulations.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
: The correction among EGFR mutations with the methylation status of CG19224164 and CG22085335.
: The impacts of risk scores on patient OS in different clinical subtypes.
. The differences in somatic mutations between the low and high-risk groups.
: The univariate Cox regression shown 1103 CpG sites were identified as potential DNA methylation biomarkers for OS in LUAD patients. Table S2: The multivariate Cox regression analysis of the 864 methylation sites with T, N, M, stage, gender, and age as covariates recognized 205 independent prognosis-related CpG sites. Table S3: The genomic annotation of the above 205 CpG sites was used to identify overall 237 corresponding promoter genes. Table S4: The functional enrichment analysis of these 237 genes. Table S5: The differences expression among the 7 clusters.
About this article
Cite this article
Xu, D., Li, C., Zhang, Y. et al. DNA methylation molecular subtypes for prognosis prediction in lung adenocarcinoma. BMC Pulm Med 22, 133 (2022). https://doi.org/10.1186/s12890-022-01924-0
- DNA methylation