- Research article
- Open Access
A bioinformatics analysis to identify novel biomarkers for prognosis of pulmonary tuberculosis
BMC Pulmonary Medicine volume 20, Article number: 279 (2020)
Due to the fact that pulmonary tuberculosis (PTB) is a highly infectious respiratory disease characterized by high herd susceptibility and hard to be treated, this study aimed to search novel effective biomarkers to improve the prognosis and treatment of PTB patients.
Firstly, bioinformatics analysis was performed to identify PTB-related differentially expressed genes (DEGs) from GEO database, which were then subjected to GO annotation and KEGG pathway enrichment analysis to initially describe their functions. Afterwards, clustering analysis was conducted to identify PTB-related gene clusters and relevant PPI networks were established using the STRING database.
Based on the further differential and clustering analyses, 10 DEGs decreased during PTB development were identified and considered as candidate hub genes. Besides, we retrospectively analyzed some relevant studies and found that 7 genes (CCL20, PTGS2, ICAM1, TIMP1, MMP9, CXCL8 and IL6) presented an intimate correlation with PTB development and had the potential serving as biomarkers.
Overall, this study provides a theoretical basis for research on novel biomarkers of PTB, and helps to estimate PTB prognosis as well as probe into targeted molecular treatment.
Tuberculosis (TB) is a kind of chronic infectious disease induced by Mycobacterium tuberculosis (MTB) with a relatively high rate of morbidity and mortality, and it has developed as a threatening public health issue globally (www.who.int/tb/publications/global_report/en/). According to the statistics reported by the World Health Organization in 2019, there were approximately 10 million newly diagnosed TB cases and about 1.4 million deaths worldwide (including HIV-positive people), and the top death toll was observed in low- and middle-income countries (http://apps.who.int/iris). Pulmonary tuberculous (PTB) is the most common TB form , and the prevention of PTB-related death can be greatly achieved via early effective diagnosis . Therefore, mining potential biomarkers associated with PTB occurrence and development is vital for PTB early diagnosis, prognosis assessment and individualized treatment.
Clinically, disease-related biomarkers that are able to predict possible responses before the start of treatment or monitor follow-up therapeutic responses are crucial for PTB treatment, as they can potentially identify the patients with a big bacterial load and/or enhanced inflammatory response, which allows doctors to provide more intensive surveillance and effective therapeutic strategies of a long period . As an alternative of sputum examination, serum-based biomarkers have attracted much attention in recent years. Unlike sputum, serum is relatively easy to be collected and it remains the available source of biomarkers during treatment. Besides, serum-derived inflammatory and infectious markers are quantified, and multiple biomarkers can be combined into a predictive biomarker signature, which can greatly increase the predictive accuracy [4,5,6,7]. Recently, some biomarkers have been verified to be implicated in PTB occurrence and development, and can be used for PTB prognosis in clinic. For instance, Klassert TE et al.  found that serum MASP1 was significantly increased in PTB patients thus affecting the lectin pathway complement activity in vitro, and it could be involved in PTB occurrence under the MTB pathogenesis. In addition, Yuzo Suzuki et al.  also discovered elevated sCD206 in serum of PTB patients, which presented a close relationship with prognosis and had been recognized as a potential biomarker. Nevertheless, there is still a need for effective biomarkers related to PTB development , which is of great significance for PTB control globally.
This study applied bioinformatics analysis on the gene expression profiles of PTB in GEO database and identified PTB-related hub genes via clustering analysis and PPI networks. In the meantime, these hub genes were analyzed for their functions in as well as associations with PTB occurrence and development, which in turn helps to exploit the potential genes valuable for PTB treatment and prognosis estimation.
Expression matrix relevant to PTB was accessed from the GEO database. The enrolled expression microarray was in accordance with the criterion that healthy controls, TB samples and post-treatment samples (n ≥ 30) shall be included. GSE54992 microarray was eventually screened for this study, comprising 39 samples in total classified as HC (healthy controls, n = 6), LTBI (latent tuberculosis infection, n = 6), TB/TB0 (tuberculosis/ 0 month after initiation of anti-TB chemotherapy, n = 9), TB3 (3 months after initiation of anti-TB chemotherapy, n = 9) and TB6 (6 months after initiation of anti-TB chemotherapy, n = 9).
Firstly, the expression data of the GSE54992 microarray were treated by the KNN algorithm of R language and then normalized. The “limma” package was used to perform differential analysis on the normalized data to identify the differentially expressed genes (DEGs) in the cases of TB vs LTBI and TB vs HC, with the threshold set as |log2FC| > 1.5 and FDR < 0.05. The overlapping DEGs were identified for subsequent analysis.
Enrichment analysis on the overlapping DEGs
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed on the overlapping DEGs using the “ClusterProfiler” package. Based on the GO analysis, gene annotations were applied to describe the biological role of a gene product in regard to three aspects: molecular function (MF), biological process (BP) and cellular component (CC). FDR < 0.05 was set as the threshold.
TCseq package is a tool that can be used to analyze different types of time course sequencing data via providing a unified suite . In this study, the TCseq package was employed to classify the overlapping DEGs into various types of Clusters (K = 6), with the genes in each Cluster were then processed for GO annotation and KEGG enrichment analysis.
Protein-protein interaction (PPI) network construction
The Search Tool for the Retrieval of Interacting Genes/Proteins database (STRING; https://string-db.org/) is a public database harboring known and predicted protein-protein interactions . Protein-protein interaction (PPI) is an indispensable approach for research on protein functions as it helps to clarify the interactions among proteins. In this study, the STRING database was used to construct a PPI network with an interaction score > 0.4. The network was then visualized using the Cytoscape software (version 3.7.0).
Identification of DEGs in PTB
Differential analysis was performed on the gene expression data from the PTB microarray GSE54992. In all, 431 DEGs in TB vs LTBI (including 212 up-regulated genes and 219 down-regulated genes) and 491 DEGs in TB vs HC (including 241 up-regulated genes and 250 down-regulated genes) were identified as shown in Fig. 1a and b. Besides, a Venn Diagram was plotted and 309 overlapping DEGs were identified (Fig. 1c), which were used for follow-up analysis.
Enrichment analysis on the overlapping DEGs
GO and KEGG enrichment analyses were conducted to explore the biological function of the 309 overlapping DEGs. Based on the GO analysis, these DEGs were mainly activated in inflammation- and immunoregulation-associated functions, as indicated by the top 10 most enriched biological activities containing leukocyte migration, cell chemotaxis, neutrophil mediated immunity, regulation of inflammatory response, T cell activation, regulation of MAP kinase activity, acute inflammatory response, cellular response to interleukin-1, B cell activation and macrophage activation (Fig. 2a). In addition, KEGG analysis suggested that these DEGs were predominantly enriched in NF-kappa B signaling pathway, TNF signaling pathway, Toll-like receptor signaling pathway, IL-17 signaling pathway, complement and coagulation cascades and other pathways intimately relevant to inflammation and immune (Fig. 2b). These results collectively demonstrated that the 309 overlapping DEGs exerted their roles predominantly in inflammatory and immunoregulatory processes during PTB occurrence and development.
Clustering analysis and further enrichment analysis
After a preliminary understanding of the biological functions of the overlapping DEGs, clustering analysis was conducted for in-depth research. As revealed in Fig. 3a, these DEGs were clustered into 6 Clusters. In anti-TB chemotherapy-treated samples, the level of the genes in Cluster 1 was decreased firstly and increased afterwards and the minimum level appeared at the third month, whereas the level of the genes in Cluster 2 exhibited an opposite expression trend. Besides, the level of the genes in Cluster 3 and Cluster 4 were elevated with time going by. Reversely, the expression level of the genes in Cluster 5 and Cluster 6 were declined with time going by. Thereafter, GO and KEGG enrichment analyses were performed, finding that there was no result satisfied considering the genes in Cluster 1, 2 and 6, while only genes in Cluster 4 presented an intimate correlation with PTB. KEGG analysis discovered that the genes in Cluster 4 were mainly enriched in NF-kappa B signaling pathway, TNF signaling pathway, Toll-like receptor signaling pathway, IL-17 signaling pathway and other immune-related pathways, and GO analysis showed some major immune functions, such as T cell activation, apoptotic cell clearance, leukocyte chemotaxis and acute inflammatory response (Fig. 3b and c). Genes in Cluster 4 were thereby selected for further analysis.
PPI network construction and hub gene identification
DEGs in Cluster 4 were projected onto a STRING network for functional enrichment analysis. A PPI network bearing totally 39 nodes were sequentially established with the threshold set as interaction score > 0.4 (Fig. 4a). Besides, the top 10 genes with a relatively high node degree were defined as hub genes and listed in Fig. 4b. Differential and clustering analyses showed that these hub genes were all down-regulated during PTB development (detailed in Supplementary Table), and then up-regulated after patients underwent anti-TB chemotherapy. In view of these, we reasoned that the top 10 genes might play an inhibitory role in PTB progression.
It has been reported that great progress has been made on the effective epidemic control of PTB due to the implement of the National TB Control Programme (2011–2015). However, despite the reduction in prevalence of smear-positive PTB cases (170/100,000 vs 57/100,000), the burden of drug-resistant PTB is still sizable, which prompts us to explore effective biomarkers for the improvement of current PTB treatment [12, 13]. Currently, there have been studies on identifying PTB-related biomarkers for early diagnosis or prognosis estimation. For instance, Guanren et al.  used bioinformatics analysis combined with clinical biochemical examination and found that the gene expression and protein content of serum SLAMF8, LILRB4 and IL-10Ra were all significantly elevated in PTB patients, and all these three genes were associated with poor prognosis. Michael et al.  identified 10 metabolites of MTB from the volatile organic compounds (VOCs) in breath, which were remarkably increased and could be used as biomarkers for PTB diagnosis. This study adopted bioinformatics methods to identify DEGs in PTB from the GEO database, which were then processed for clustering analysis and projected into a PPI network for screening candidate hub genes (CCL20, F3, THBS1, PTGS2, PLAU, ICAM1, TIMP1, MMP9, CXCL8 and IL6) that were intimately associated with PTB occurrence and development. Hence, to clarify whether these hub genes have the potential serving as biomarkers of PTB, we retrospectively analyzed relevant research on PTB.
C-C motif chemokine ligand 20 (CCL20) is a special chemokine ligand of the C-C motif chemokine receptor 6 (CCR6) functioning under multiple pathological conditions . It’s reported that cytokines and chemokines both participate in protective immunity and immunopathogenesis of TB, as well as in MTB-host-pathogen interactions . Lee JS et al.  investigated the level of CCL20 and the corresponding regulatory mechanism in PTB cases and healthy controls, finding that CCL20 was up-regulated in PTB patients and mediated by proinflammatory cytokines. PTGS2 (Prostaglandin-endoperoxide synthase 2), also known as cyclooxygenase-2 (COX-2), is a type of enzyme responsible for generation of intermediate PGH. For TB-infectious macrophages, PGH-induced repair for plasma membrane damage is crucial . Moreover, the mechanism by which MTB regulates COX-2 expression in macrophages is reported to be an important factor during the initiation or maintenance of host immune response . Wang L et al.  revealed that COX-2 inhibition could suppress the apoptosis of macrophages induced by secreted MTB lipoprotein. Rand L et al.  reported that COX-2 could inhibit p38MAPK-PG signaling pathway to decrease MMP-1 activity, which could be considered as a therapeutic target to attenuate the damage of PTB inflammatory tissues. ICAM1 (Intercellular adhesion molecule 1; CD54), a member of immunoglobulin super family (Igsf) , is necessary for cell adhesion and acts as an important player in inflammation-induced tissue adhesion, tumor metastasis and immune response . Du SS et al.  identified some differentially expressed proteins associated with PTB diagnosis using protein microarray technique, and found that ICAM1 had relatively high sensitivity and specificity and had the potential serving as an indicator for sputum-negative PTB diagnosis. MMP-9 has been discovered to be involved in the recruitment of macrophages and granuloma occurrence as suggested by Jennifer L et al., and early MMP activity is a crucial part for lung MTB infection resistance. To be specific, MMP-9 is a necessity for macrophage recruitment and tissue remodeling during PTB progression . CXCL8 (C-X-C motif chemokine ligand 8) inflammatory cytokine can be released during the activation of macrophages so as to foster the establishment of immune system network, and it has been detected to be up-regulated in PTB sufferers . Block DC et al.  described that CXCL8 was the natural immune regulator in active PTB patients. IL6 (interleukin 6) is regarded to be a biomarker for predicting the death of HIV-negative PTB patients as supported by Wang Q et al.  Besides, IL6 is also believed to be associated with MTB infection and PTB susceptibility . Similarly, the alteration of fibrosis-related TIMP1 has been identified to be tightly relevant to the pathological basis of PTB susceptibility, as revealed by Marquis JF et al. . Collectively, the above results demonstrate that these hub genes can function during PTB occurrence and development by serving as immune regulators, therapeutic targets, and potential biomarkers, and they can affect PTB susceptibility and resist MTB infection. In addition, these results support our study on mining effective biomarkers of PTB from the 10 candidate hub genes. Furthermore, some other genes like F3, THBS1 and PLAU have not been investigated currently for their role in improvement of PTB treatment.
Although a relatively accurate prediction for PTB prognosis could be achieved by the above hub genes we identified, there are still some limitations in this study. TB is a multifactorial disease that can be divided into non-tuberculous mycobacteria (NTM) infections and MTB based on the type of pathogen. NTM infections are predominantly caused by mycobacteria except Mycobacterium tuberculosis, Mycobacterium bovis and Mycobacterium leprae, with symptoms similar to MTB, making it hard to be diagnosed in clinic. Besides, NTM infections are less toxic relative to MTB but have similar clinical manifestations to MTB, and the identification of NTM infections is generally realized by means of bacterial culture . Studies believed that patients have various physiological and biochemical responses to NTM infections and MTB. Feng et al.  made a study on macrophages and believed that the activation of NF-κB in MTB patients was more significant in comparison with that in patients with NTM infections, and there were differences in IL-8, IL-10 and TNF-α in different infections. Additionally, Nurlela et al.  also discovered that level of TNF-α in pleural fluid of patients with NTM infections and MTB was different, with that in MTB sufferers significantly higher. In the present study, due to the lack of proper data, analysis for the TB patients infected by different pathogens was not conducted. Besides, this study is purely a bioinformatics analysis without any in vivo and in vitro data. Therefore, more analyses should be carried out to help us gain more insight into the 10 hub genes, so as to bring benefit to the patients with TB.
In sum, based on a series of bioinformatics methods and a retrospective analysis, our study identified 7 hub genes which showed an intimate correlation with PTB development and prognosis and had the potential acting as therapeutic targets and prognostic indicators. Meanwhile, there are some limitations in our study which will be further solved in our follow-up studies.
Availability of data and materials
The datasets analysed during the current study are available in the Gene Expression Omnibus repository, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54992.
Differentially expressed genes
Gene Expression Omnibus
Latent tuberculosis infection
0 month after initiation of anti-TB chemotherapy
3 months after initiation of anti-TB chemotherapy
6 months after initiation of anti-TB chemotherapy
Kyoto Encyclopedia of Genes and Genomes
The Search Tool for the Retrieval of Interacting Genes/Proteins database
Volatile organic compounds
C-C motif chemokine ligand 20
C-C motif chemokine receptor 6
Prostaglandin-endoperoxide synthase 2
Intercellular adhesion molecule 1
Immunoglobulin super family
C-X-C motif chemokine ligand 8
Grace AG, Mittal A, Jain S, Tripathy JP, Satyanarayana S, Tharyan P, et al. Shortened treatment regimens versus the standard regimen for drug-sensitive pulmonary tuberculosis. Cochrane Database Syst Rev. 2019;12:CD012918.
Sambarey A, Devaprasad A, Mohan A, Ahmed A, Nayak S, Swaminathan S, et al. Unbiased identification of blood-based biomarkers for pulmonary tuberculosis by modeling and mining molecular interaction networks. EBioMedicine. 2017;15:112–26.
Walzl G, Ronacher K, Hanekom W, Scriba TJ, Zumla A. Immunological biomarkers of tuberculosis. Nat Rev Immunol. 2011;11(5):343–54.
Andrade BB, Pavan Kumar N, Mayer-Barber KD, Barber DL, Sridhar R, Rekha VV, et al. Plasma heme oxygenase-1 levels distinguish latent or successfully treated human tuberculosis from active disease. PLoS One. 2013;8(5):e62618.
Huang CT, Lee LN, Ho CC, Shu CC, Ruan SY, Tsai YJ, et al. High serum levels of procalcitonin and soluble TREM-1 correlated with poor prognosis in pulmonary tuberculosis. J Inf Secur. 2014;68(5):440–7.
Jayakumar A, Vittinghoff E, Segal MR, MacKenzie WR, Johnson JL, Gitta P, et al. Serum biomarkers of treatment response within a randomized clinical trial for pulmonary tuberculosis. Tuberculosis (Edinb). 2015;95(4):415–20.
Mihret A, Bekele Y, Bobosha K, Kidd M, Aseffa A, Howe R, et al. Plasma cytokines and chemokines differentiate between active disease and non-active tuberculosis infection. J Inf Secur. 2013;66(4):357–65.
Klassert TE, Goyal S, Stock M, Driesch D, Hussain A, Berrocal-Almanza LC, et al. AmpliSeq screening of genes encoding the C-type Lectin receptors and their signaling components reveals a common variant in MASP1 associated with pulmonary tuberculosis in an Indian population. Front Immunol. 2018;9:242.
Suzuki Y, Shirai M, Asada K, Yasui H, Karayama M, Hozumi H, et al. Macrophage mannose receptor, CD206, predict prognosis in patients with pulmonary tuberculosis. Sci Rep. 2018;8(1):13129.
Wang ZG, Guo LL, Ji XR, Yu YH, Zhang GH, Guo DL. Transcriptional Analysis of the Early Ripening of 'Kyoho' Grape in Response to the Treatment of Riboflavin. Genes (Basel). 2019;10(7).
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47(D1):D607–D13.
Wang L, Zhang H, Ruan Y, Chin DP, Xia Y, Cheng S, et al. Tuberculosis prevalence in China, 1990-2010; a longitudinal analysis of national survey data. Lancet. 2014;383(9934):2057–64.
Liu Q, Zhu L, Shao Y, Song H, Li G, Zhou Y, et al. Rates and risk factors for drug resistance tuberculosis in northeastern China. BMC Public Health. 2013;13:1171.
Zhao G, Luo X, Han X, Liu Z. Combining bioinformatics and biological detection to identify novel biomarkers for diagnosis and prognosis of pulmonary tuberculosis. Saudi Med J. 2020;41(4):351–60.
Phillips M, Basa-Dalay V, Bothamley G, Cataneo RN, Lam PK, Natividad MP, et al. Breath biomarkers of active pulmonary tuberculosis. Tuberculosis (Edinb). 2010;90(2):145–51.
Schutyser E, Struyf S, Van Damme J. The CC chemokine CCL20 and its receptor CCR6. Cytokine Growth Factor Rev. 2003;14(5):409–26.
Jo EK, Park JK, Dockrell HM. Dynamics of cytokine generation in patients with active pulmonary tuberculosis. Curr Opin Infect Dis. 2003;16(3):205–10.
Lee JS, Lee JY, Son JW, Oh JH, Shin DM, Yuk JM, et al. Expression and regulation of the CC-chemokine ligand 20 during human tuberculosis. Scand J Immunol. 2008;67(1):77–85.
Dheda K, Barry CE 3rd, Maartens G. Tuberculosis. Lancet. 2016;387(10024):1211–26.
Pathak SK, Bhattacharyya A, Pathak S, Basak C, Mandal D, Kundu M, et al. Toll-like receptor 2 and mitogen- and stress-activated kinase 1 are effectors of Mycobacterium avium-induced cyclooxygenase-2 expression in macrophages. J Biol Chem. 2004;279(53):55127–36.
Wang L, Zuo M, Chen H, Liu S, Wu X, Cui Z, et al. Mycobacterium tuberculosis lipoprotein MPT83 induces apoptosis of infected macrophages by activating the TLR2/p38/COX-2 signaling pathway. J Immunol. 2017;198(12):4772–80.
Rand L, Green JA, Saraiva L, Friedland JS, Elkington PT. Matrix metalloproteinase-1 is regulated in tuberculosis by a p38 MAPK-dependent, p-aminosalicylic acid-sensitive signaling cascade. J Immunol. 2009;182(9):5865–72.
Rothlein R, Springer TA. The requirement for lymphocyte function-associated antigen 1 in homotypic leukocyte adhesion stimulated by phorbol ester. J Exp Med. 1986;163(5):1132–49.
van Dinther-Janssen AC, van Maarsseveen TC, Eckert H, Newman W, Meijer CJ. Identical expression of ELAM-1, VCAM-1, and ICAM-1 in sarcoidosis and usual interstitial pneumonitis. J Pathol. 1993;170(2):157–64.
Du SS, Zhao MM, Zhang Y, Zhang P, Hu Y, Wang LS, et al. Screening for differentially expressed proteins relevant to the differential diagnosis of Sarcoidosis and tuberculosis. PLoS One. 2015;10(9):e0132466.
Taylor JL, Hattle JM, Dreitz SA, Troudt JM, Izzo LS, Basaraba RJ, et al. Role for matrix metalloproteinase 9 in granuloma formation during pulmonary Mycobacterium tuberculosis infection. Infect Immun. 2006;74(11):6135–44.
Aryanpur M, Mortaz E, Masjedi MR, Tabarsi P, Garssen J, Adcock IM, et al. Reduced phagocytic capacity of blood monocyte/macrophages in tuberculosis patients is further reduced by smoking. Iran J Allergy Asthma Immunol. 2016;15(3):174–82.
Blok DC, Kager LM, Hoogendijk AJ, Lede IO, Rahman W, Afroz R, et al. Expression of inhibitory regulators of innate immunity in patients with active tuberculosis. BMC Infect Dis. 2015;15:98.
Wang Q, Han W, Niu J, Sun B, Dong W, Li G. Prognostic value of serum macrophage migration inhibitory factor levels in pulmonary tuberculosis. Respir Res. 2019;20(1):50.
Wu S, Wang Y, Zhang M, Shrestha SS, Wang M, He JQ. Genetic polymorphisms of IL1B, IL6, and TNFalpha in a Chinese Han population with pulmonary tuberculosis. Biomed Res Int. 2018;2018:3010898.
Marquis JF, Nantel A, LaCourse R, Ryan L, North RJ, Gros P. Fibrotic response as a distinguishing feature of resistance and susceptibility to pulmonary infection with Mycobacterium tuberculosis in mice. Infect Immun. 2008;76(1):78–88.
Ahmed I, Tiberi S, Farooqi J, Jabeen K, Yeboah-Manu D, Migliori GB, et al. Non-tuberculous mycobacterial infections-a neglected and emerging problem. Int J Infect Dis. 2020;92S:S46–50.
Feng Z, Bai X, Wang T, Garcia C, Bai A, Li L, et al. Differential responses by human macrophages to infection with Mycobacterium tuberculosis and non-tuberculous mycobacteria. Front Microbiol. 2020;11:116.
Damayanti N, Yudhawati R. The comparison of pleural fluid TNF-alpha levels in tuberculous and nontuberculous pleural effusion. Indian J Tuberc. 2020;67(1):98–104.
We sincerely thank the researchers for providing their GEO databases information online, it is our pleasure to acknowledge their contributions.
The study was sponsored by National Natural Science Foundation of China (81570020), Shanghai Changhai Hospital Scientific Research Fund (2019SLZ002、2019YXK018). The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sun, Y., Chen, G., Liu, Z. et al. A bioinformatics analysis to identify novel biomarkers for prognosis of pulmonary tuberculosis. BMC Pulm Med 20, 279 (2020). https://doi.org/10.1186/s12890-020-01316-2
- Pulmonary tuberculosis
- Clustering analysis
- Enrichment analysis
- Hub gene
- PPI network