GPC3 affects the prognosis of lung adenocarcinoma and lung squamous cell carcinoma

Background Glypican 3 (GPC3) is a heparin sulphate proteoglycan whose expression is associated with several malignancies. However, its expression in non-small-cell lung carcinoma (NSCLC) is limited and ambiguous. This study aimed to comprehensively evaluate the expression of GPC3 in NSCLC and develop a risk-score model for predicting the prognosis of NSCLC. Methods The gene expression profiles of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) were downloaded from the UCSC Xena database. Using the limma package, the differentially expressed genes (DEGs) between different comparison groups were analysed and the differential expression of GPC3 was calculated. A functional enrichment analysis was conducted for GPC3-associated genes using the DAVID tool. For the GPC3-associated genes shared by the four comparison groups, a protein–protein interaction network was built using the Cytoscape software. After conducting a survival analysis and a Cox regression analysis, the genes found to be significantly correlated with prognosis were selected to construct a risk-score model. Besides, the gene and protein levels of GPC3 were examined by quantitative reverse transcriptase-PCR (qRT-PCR) and immunohistochemistry (IHC) in LUSC tissues and paracancer tissues. Results The differential expression of GPC3 was significant (adjusted P < 0.05) in the NSCLC vs. normal, LUAD vs. normal, LUSC versus normal, and LUAD versus. LUSC comparison groups. GPC3 directly interacted with SERPINA1, MFI2, and FOXM1. Moreover, GPC3 expression was significantly correlated with pathologic N, pathologic T, gender, and tumour stage in LUAD samples. Finally, the risk-score model (involving MFI2, FOXM1, and GPC3) for LUAD and that (involving SERPINA1 and FOXM1) for LUSC were established separately. The qRT-PCR result showed that GPC3 expression was much higher in the LUSC tissues than that in the normal group. The IHC results further showed that GPC3 is highly expressed in LUSC tissues, but low in paracancer tissues. Conclusion The three-gene risk-score model for LUAD and the two-gene risk-score model for LUSC might be valuable in improving the prognosis of these carcinomas.


Background
Lung cancer, a malignant tumour with the fastest increasing morbidity and mortality rates, is the greatest threat to human health and life [1]. Small-cell lung carcinoma (SCLC) and non-small-cell lung carcinoma (NSCLC) are the two main pathological types of lung cancer, with NSCLC accounting for approximately 85% of lung cancers [2]. NSCLC can be categorised into several subtypes, including lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and lung large cell carcinoma (LCLC) [3]. Moreover, the occurrence of LUSC is tightly correlated with smoking, and it has been reported that the rate of exposure to smoking in LUSC patients exceeds 90% [4]. Clinically, only a small proportion of NSCLC patients is diagnosed at the early stages (stage I or II), and surgical resection is the most effective treatment for stage I, II, and IIIA NSCLC [5]. More than 60% of lung cancer patients have locally advanced or metastatic disease (stage III or IV) at the time of diagnosis and have lost the chance of radical treatment [5]. In patients who have undergone surgical treatment, there is a high risk of recurrence despite the possibility of complete remission. Therefore, it is important to understand the pathogenesis of NSCLC to improve treatment outcomes.
The past decades have witnessed rapid development in the pathology of lung cancer, and numerous dysregulated genes involved in NSCLC have been identified. A previous study demonstrated that astrocyte-elevated gene-1 (AEG-1) acts in the formation and deterioration of NSCLC by regulating matrix metalloproteinase-9 (MMP9), resulting in an unfavourable clinical outcome [6]. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) overexpression indicates a poor prognosis in early-stage NSCLC, and the assessment of glucose metabolism has certain prognostic value in this tumour [7]. Hypoxia-inducible factor-2α (HIF-2α) expression is correlated with lymph-node metastasis, tumour size, tumour histology, and tumour stage, making it a potential candidate target for predicting the progression and clinical outcome of LUAD [8,9]. Decreased N-MYC downstream-regulated gene 2 (NDRG2) is important for the tumorigenesis of lung cancer and may be considered a valuable prognostic marker in lung cancer [10]. Increased Notch homolog 2 (Notch2 expression in LUAD patients can induce a high tumour recurrence rate, and high expression of Notch1 and Notch3 is related to adverse prognosis in LUAD [11]. Claudin-3 (CLDN3) in LUSC tissues is related to tumour progression and represses epithelial-mesenchymal transition (EMT) via activation of the Wnt pathway; therefore, CLDN3 may be a candidate biomarker for the prognosis and treatment of LUSC [12]. However, more genes affecting the prognosis of NSCLC still need to be explored.
Glypican 3 (GPC3) is a membrane-bound heparin sulphate proteoglycan located on chromosome Xq26 [13]. It is highly expressed during foetal life, but its levels decrease after birth [14]. The expression patterns of GPC3 in different cancer types have been reported to be different, and its role is controversial. GPC3 is overexpressed in hepatocellular carcinoma (HCC), embryonal tumours, melanoma, hepatoblastoma, and testicular germ-cell tumours, and it acts as a tumour oncogene [15][16][17][18][19][20]. However, mutations or loss of expression have been reported in Simpson-Golabi-Behmel syndrome [21], ovarian carcinoma, breast cancer, and mesothelioma [22][23][24][25], suggesting that GPC3 functions as a tumour-suppressor gene. Currently, reports of GPC3 in lung cancer are limited and ambiguous. Kim et al. reported that GPC3 expression was decreased in LUAD compared with that in paired normal tissues [26]. In a study by Sarit et al., GPC3 was found to be overexpressed in LUSC (positive rate of 55%) but not in LUAD (positive rate of 8%), which might be induced by smoking [27].
In this study, the gene expression profiles of NSCLC were obtained. Differential expression and enrichment analyses for different comparison groups were then carried out. After the genes correlated with GPC3 were screened out, a protein-protein interaction (PPI) network analysis, survival analysis, and Cox regression analysis were conducted separately. The present results might help to elucidate the GPC3-correlated prognostic mechanisms of LUAD and LUSC.

Data source
samples and 59 normal samples) and LUSC (including 501 tumour samples and 49 normal samples) were downloaded. Meanwhile, the clinical phenotypes (including smoking and sex) and prognostic information (including survival status and survival time) of these samples were extracted. These samples all contain clinical phenotypes and prognostic information.
For the DEGs of each comparison group, Gene Ontology (GO) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) [30,31] enrichment analyses were performed separately using the DAVID online tool [32] (version 6.7, https:// david-d. ncifc rf. gov/). A false discovery rate (FDR) < 0.05 was the threshold for selecting significantly enriched results.

Construction of PPI network
For each comparison group, the genes involved in the significant GO/KEGG terms correlated with GPC3 were screened. Then, the intersections of these genes in different comparison groups were selected by drawing a Venn diagram [33], and the common genes were considered candidate genes. Using the STRING database [34] (http:// www. string-db. org) and Cytoscape software [35] (https:// cytos cape. org/), a PPI network was constructed to identify the hub genes and the genes directly correlated with GPC3.

Correlation of GPC3 with clinical phenotypes
There were 585 LUAD and 550 LUSC samples. In addition to GPC3 expression, phenotypes such as age, location, years smoked, pathologic M, pathologic N, pathologic T, radiation therapy, sex, and tumour stage were also investigated. The baseline data of LUAD and LUSC samples are listed in Tables 1 and 2, respectively. GPC3 expression is the standardised expression value of log 2 (FPKM + 1). The average value of the non-empty samples was calculated for the expression of two characteristics-years smoked and GPC3 expression. The nonempty samples were then compared with the average value and divided into two groups: high and low. Lung cancer was most common between the ages of 45 and 65 years; therefore, the samples were divided into two groups based on age (≥ 65 or < 65 years). For each clinical phenotype, GPC3 expression was correlated with the clinical phenotype subgroups of the samples. The Wilcoxon rank sum test [36] was conducted for the phenotypes of the two groups, and the Kruskal-Wallis rank sum test [37] was performed for the phenotypes of multiple groups. A P value < 0.05 was set as the threshold.

Survival analysis
Based on the extracted prognostic information of the samples, the overall survival (OS) and OS status of the corresponding patients were determined. GPC3 and the genes directly correlated with GPC3 were considered as candidate features, and the patients were classified into high-expression and low-expression groups based on the median expression value of GPC3. The median expression value greater than GPC3 was high-expression, and the median expression value less than or equal to GPC3 was low-expression. Combined with the prognostic information of the samples, Kaplan-Meier (KM) survival analysis [38] was carried out. The log-rank test [39] was used to calculate P values. A P value < 0.05 indicated a significant correlation.

Univariate and multivariate Cox regression analyses
Based on the expression levels of GPC3 and the genes directly correlated with GPC3, along with the prognostic information of the samples, univariate Cox regression analysis [40] was performed using the coxph() function in R [41]. The regression coefficient and P value of each clinical factor, survival time, and state were calculated. Subsequently, a multivariate Cox regression analysis [42] was conducted for the clinical factors with P < 0.05, to obtain the final risk-score model. The samples were divided into high-risk and low-risk groups based on their risk scores, and a KM survival analysis [43] was performed. Furthermore, the 1-year, 3-year, and 5-year survival rates of the samples were predicted based on their risk scores; receiver operating characteristic (ROC) curves [44] were drawn, and the corresponding area under the ROC curve (AUC) values were calculated.

cBioPortal analysis
Genome data from the Cancer Genome Atlas (TCGA) lung cancer dataset using cBioportal (https:// www. cbiop  ortal. org/) have been retrieved in order to identify mutations and copy number alterations (CNAs) of GPC3 [45]. The location and frequency of GPC3 and GPC3-related gene alterations (amplifications, deep deletions and missene mutations) and copy number variance data were evaluated.

Quantitative reverse transcriptase-PCR (qRT-PCR)
Total RNA was extracted from LUSC and paracancer tissues using the TRIzol Reagent (Invitrogen, USA). Then, the cDNA was reverse-transcribed using a Prime-ScriptTM RT kit with gDNA Eraser (TaKaRa, China).
According to the manufacturer's instructions, qRT-PCR was performed using TB Green ® Premix Ex Taq ™ II (Takara, Japan) on a CFX96 Real-Time PCR Detection System. GAPDH was used as an internal reference gene. The reaction mixture for qRT-PCR was prepared as follows: 8.5 μL of sterile purified water, 12.5 μL of TB Green Premix Ex Taq II (Tli RNaseH Plus) (2X), 1 μL of PCR forward primer (10 μM), 1 μL of qRT-PCR reverse primer (10 μM), and 2 μL cDNA template (< 100 ng) were mixed. The reaction conditions for qRT-PCR were as follows: initial denaturation at 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s for denaturation and annealing/elongation, respectively. The 2 −ΔΔCT method was used to measure relative expression.

Immunohistochemistry (IHC) validation
Cancer tissue specimens and paraffin sections of adjacent tissues were collected from 10 patients undergoing pulmonary malignant tumour surgery at the Liaoning Cancer Hospital and Institute between June 2018 and June 2020. The study was approved by the Ethics Committee of the Liaoning Cancer Hospital and Institute. The paraffin specimens of LUSC patients obtained after surgery were cut into two pieces for IHC staining. The sections were deparaffinized, and antigen retrieval was performed with citrate buffer (pH 6.0) under high temperature and pressure, followed by natural cooling to room temperature and dilution with PBS. After incubation in 3% BSA for 1 h, the cells were incubated with an Anti-GPC3 antibody overnight at 4 °C. Subsequently, the sections were incubated with secondary antibodies (Zsbio, China) and stained with diaminobenzidine (DAB) and haematoxylin.  (Fig. 1). Therefore, the DEGs in these four groups were selected for further analysis. There were 134 GO/KEGG terms (such as response to wounding, immune response, and cell adhesion, involving 1513 genes) enriched for the DEGs in the NSCLC vs. normal, 105 terms (such as response to wounding, cell adhesion, and biological adhesion, involving 1113 genes) in the LUAD vs. normal, 133 terms (such as response to wounding, mitosis, and nuclear division, involving 1701 genes) in the LUSC vs. normal, and 38 terms (such as ectoderm development, epidermis development, and epithelium development, involving 563 genes) in the LUAD versus LUSC comparison groups. Among these, 19 terms (such as regulation of cell proliferation, tube development, and epithelium development, involving 1015 genes) in the NSCLC versus normal comparison group, 17 terms (such as regulation of cell proliferation, tube development, and branching morphogenesis of a tube, involving 803 genes) in the LUAD versus normal comparison group, 18 terms (such as regulation of cell proliferation, tube development, and epithelium development, involving 1128 genes;) in the LUSC versus normal comparison group, and 12 terms (such as epithelium development, regulation of cell proliferation, and tube morphogenesis, involving 473 genes) in the LUAD versus LUSC comparison group were correlated with GPC3.

Construction of PPI network
Venn diagrams showed that the 10 GO/KEGG terms (such as regulation of cell proliferation, plasma membrane, and extracellular space) (Fig. 2a) and 102 genes (Fig. 2b) correlated with GPC3 were shared by the four comparison groups. For the 102 common genes (including GPC3), PPI pairs were predicted, and the PPI network (including 148 edges) was visualised (Fig. 3). In the PPI network, GPC3 directly interacted with serpin family A member 1 (SERPINA1), melanin transferrin (MFI2), and forkhead box M1 (FOXM1).

Survival analysis
For GPC3 and the genes (including SERPINA1, MFI2, and FOXM1) directly interacting with it, gene expression levels were correlated with the prognostic information of the samples to perform KM survival analysis separately for LUAD and LUSC. The results showed that GPC3, MFI2, and FOXM1 were significantly correlated with the prognosis of LUAD patients (Fig. 5a), and GPC3, SERPINA1, and FOXM1 were significantly correlated with the prognosis of LUSC patients (Fig. 5b).

Univariate and multivariate Cox regression analyses
The results of univariate Cox regression analysis showed that MFI2, FOXM1, and GPC3 had a significant influence on the survival time of LUAD patients (P < 0.05, Table 5). A multivariate Cox regression analysis further suggested that the combination of MFI2, FOXM1, and GPC3 could significantly affect the prognosis (P-values of the likelihood ratio test, Wald test, and score (log rank) test were 2.908e−05, 1.105e−05, and 8.474e−06, respectively). Finally, the following risk-score model was established: Risk score = 0.26187*MFI2 + 0.07721*FOXM1-0.01199*GPC3.
Combined with the risk score model, the samples were classified into high-risk and low-risk groups. A KM survival analysis indicated that the survival time of the low-risk group was significantly higher than that of the high-risk group (P = 0.00032, Fig. 6a). The AUC values of the 1-year, 3-year, and 5-year survival situations predicted by the risk-score model were stabilised at about 0.6 (Fig. 6b).
Meanwhile, a univariate Cox regression analysis showed that SERPINA1 and FOXM1 had significant effects on the survival time of LUSC patients (P < 0.05, Table 6). Moreover, SERPINA1 and FOXM1 were included in the multivariate Cox regression analysis, and the P values of the likelihood ratio test, Wald test, and score (log rank) test were 0.002019, 0.002015, and 0.001904, respectively. In addition, the following risk-score model was constructed: Risk score = 0.12417*SERPINA1 + 0.00518*FOXM1.
The LUSC samples were divided into high-risk and low-risk groups based on the risk-score model, and a KM survival analysis showed that the survival time of the low-risk group was significantly higher than that of the high-risk group (P = 0.039, Fig. 6c). Similarly, all AUC values of the 1-year, 3-year, and 5-year survival situations predicted by the risk-score model were approximately 0.6 (Fig. 6d).

GPC3 expression analysis with qRT-PCR
The qRT-PCR was used to determine the expression of GPC3 in LUSC. GPC3 expression was much higher in the LUSC tissues than that in the control group (P < 0.001, Fig. 8). This is consistent with the previous sequencing results.

IHC analysis of GPC3 protein expression
IHC analysis revealed positive staining of GPC3 protein in seven of the ten (70%) paraffin-embedded LUSC tissues, while negative staining was observed in the remaining cases (three of ten, 30%) (Fig. 9). This suggests that the high expression of GPC3 may be related to the occurrence of LUSC.

Discussion
GPC3 is a heparin sulphate proteoglycan whose expression is associated with several malignancies. However, its expression in lung cancer is limited and remains ambiguous. This study found that GPC3 is significantly downregulated in NSCLC tissues compared with that in paracancer tissues, and its expression in LUAD is significantly lower than that in LUSC. In LUAD samples, GPC3 expression was significantly correlated with pathologic N, pathologic T, gender, and tumour stage. However, GPC3 expression was not significantly associated with these clinical phenotypes in LUSC. GPC3 overexpression in liver cancer has been frequently reported without debate; however, its expression pattern in NSCLC remains debatable. Kim et al. reported that GPC3 expression was decreased in LUAD compared with that in paired normal tissues [26]. In a study by Sarit et al., GPC3 was found to be overexpressed in LUSC (positive rate of 55%) but not in LUAD (positive rate of 8%), which might be induced by smoking [27]. The present study found that the expression of GPC3 in NSCLC is lower than that in paracancer tissues, which is in line with a study by Kim et al. [26] and in contradiction with a study by Sarit et al. In addition, the present study found that GPC3 was not differentially expressed in the smoker vs. non-smoker group or the LUSC smoker vs. LUSC non-smoker group. This is also different from the study by Sarit et al. It was hypothesised that this is due to the use of different methods to detect GPC3 expression and the sample size. Sarit et al. used immunohistochemistry on tissue microarrays to evaluate the expression of GPC3 in 97 patients. However, the present study was based on high-sequencing data of more than 500 samples in TCGA database.  The molecular mechanism of action of GPC3 in cancer remains unclear. In this study, a PPI network was developed for GPC3-associated genes. In this network, GPC3 exhibited significant correlation with SERPINA1, MFI2, and FOXM1 directly in the PPI network, indicating that GPC3 might also act in NSCLC by interacting with SERPINA1, MFI2, and FOXM1. Germline mutation of MFI2 is significantly greater in LUAD patients among young non-smokers, which may be implicated in the pathogenesis of LUAD [46]. FOXM1 functions in cell cycle progression, and increased FOXM1 expression is related to unfavourable outcomes of NSCLC due to the promotion of cell metastasis [47]. FOXM1 plays a role in EMT induced by TGF-beta1, and miR-134 functions as an EMT suppressor by targeting FOXM1 in NSCLC cells [48]. FOXM1 overexpression contributes to cell invasion and migration in NSCLC, which are responsible for the adverse survival of patients with this disease [49,50]. FOXM1 can affect gefitinib resistance in NSCLC cells in vitro, making this gene a target for reducing resistance to gefitinib [51]. These results suggest that FOXM1 may be correlated with the prognosis of patients with NSCLC. Combined with Cox regression analysis, a riskscore model consisting of the prognosis-associated genes MFI2, FOXM1, and GPC3 was developed for LUAD prognosis, and the AUC values of the 1-year, 3-year, and 5-year survival situations were stabilised at approximately 0.6, suggesting a relatively higher reliability.
In LUSC, the expression of GPC3 was not significantly related to survival time (P > 0.05). Therefore, a risk-score model consisting of the prognosis-associated  genes SERPINA1 and FOXM1 was established for LUSC. α-1 antitrypsin (AAT) is a serine proteinase inhibitor that plays an antiprotease protective role in the human body, and mutations in the gene SERPINA1 can lead to chronic obstructive pulmonary diseases by inducing AAT deficiency [52]. The SERPINA1 PiMZ genotype, combined with smoking, causes lung-function decline by modifying the correlation between longitudinal change and occupational exposure in lung function [53]. Increased SERPINA1 gene expression ameliorates tumour cell migration, apoptosis resistance, and colony formation, and SERPINA1 and its corresponding protein, AAT, influence the mechanisms of lung cancer [54]. Thus, SERPINA1 may influence the outcomes of NSCLC patients.

Conclusion
In conclusion, GPC3 was significantly downregulated in NSCLC tissues compared with that in paracancer tissues, and its expression in LUAD was significantly lower than that in LUSC. In LUAD samples, GPC3 expression was significantly correlated with pathologic N, pathologic T, gender, and tumour stage. The PPI network showed that GPC3 can interact with SERPINA1, MFI2, and FOXM1 directly. In addition, the three-gene risk-score model (involving MFI2, FOXM1, and GPC3) for LUAD and the two-gene risk-score model (involving SERPINA1 and FOXM1) for LUSC might be useful in predicting the prognosis of tumour patients. However, the utility values of the risk-score models should be further validated in subsequent experiments.