- Research article
- Open Access
Propensity score analysis of lung cancer risk in a population with high prevalence of non-smoking related lung cancer
BMC Pulmonary Medicine volume 17, Article number: 120 (2017)
Lung cancer has been the leading cause of cancer-related mortality worldwide among both men and women in recent years. There is an increase in the incidence of nonsmoking-related lung cancer in recent years. The purpose of the present study was to investigate multiple potential risk factors for nonsmoking-related lung cancer among Asian Ethnic Groups.
We used a propensity score-mated cohort analysis for this study. We retrospectively review the medical record of 1975 asymptomatic healthy subjects (40 ~ 80 years old) who voluntarily underwent low-dose chest CT from August 2013 to October 2014. Clinical information and nodule characteristics were recorded.
A propensity score-mated cohort analysis was applied to adjust for potential bias and to create two comparable groups according to family history of lung cancer. For our primary analysis, we matched 392 pairs of subjects with family history of lung cancer and subjects without history. Logistic regression showed that female gender and a family history of lung cancer were the two most important predictor of lung cancer in the endemic area with high prevalence of nonsmoking-related lung cancer (OR = 11.199, 95% CI = 1.444–86.862; OR = 2.831, 95% CI = 1.000136–8.015). In addition, the number of nodules was higher in subjects with family history of lung cancer in comparison with subjects without family history of lung cancer (OR = 1.309, 95% CI = 1.066–1.607).
In conclusion, risk-based prediction model based on the family history of lung cancer and female gender can potentially improve efficiency of lung cancer screening programs in Taiwan.
Lung cancer has been the leading cause of cancer-related mortality worldwide among both men and women in recent years [1,2,3]. The landmark National Lung Screening Trial (NLST) evaluated the benefits of low-dose computed tomography (LDCT) for screening of heavy smokers (≥30 pack-years) and found that annual screening by LDCT yielded a relative reduction of lung cancer mortality of 20% among those screened when compared to chest radiography . Smoking is the major risk factor for lung cancer, but an increase in the incidence of nonsmoking-related lung cancer in recent years has been addressed [4,5,6,7,8]. There has been an increase in the prevalence of non-smoking associated lung cancers in Asian countries such as China, Taiwan, Korea, and Japan over the past few years [9, 10]. Previous studies suggested that a potential association among nonsmokers who had lung adenocarcinoma with associated risk factors such as age, gender, body mass index (BMI), history of lung cancer, and personal cancer history [8, 11]. A major concern has remained regarding that selection bias that occurs as a result of self-referral or physician referral in the setting of these studies designs, which is ordinarily considered a threat to both internal and external validity of the studies [8, 11]. Propensity score matching method is increasingly being used currently and a useful statistical technology in observational studies to ensure that propensity score is balanced across treatment and control groups as an alternative to conventional covariate adjustment in logistic regression models . Using propensity score matching analysis, clinical/demographic characteristics of subjects between the groups with family history of lung cancer (+) versus without family history of lung cancer (−) could be balanced out, thus mimicking randomized controlled trial design. The purpose of the present study was to investigate potential risk factors for nonsmoking-related lung cancer among Asian population based on propensity score matching analysis which could reduce selection bias and potential baseline differences between the two groups.
Study population and cohort
A flow diagram describing the subject recruitment and exclusions is shown in Fig. 1. We retrospectively analyzed 1975 (1083 males and 892 females) asymptomatic healthy subjects (age range 40 to be 80-year-old) who voluntarily underwent self-paid LDCT exam at the health check-up center of Kaohsiung Veterans General Hospital from August 2013 to October 2014. Clinical information included gender, age, BMI, family history of lung cancer, and family history of other cancers in first and second-degree relatives was collected. Moreover, nodular characteristics were recorded according to ACR Lung-RADS classification shown in Table 1 [13, 14]. Categories1 (negative) and 2 (benign appearance) correspond to negative screening results, and categories 3 (probably benign) and 4 (suspicious) correspond to positive screening results. Category 4 is divided into 4A, 4B, and 4X, based on the level of suspicion of malignancy according to the nodule size and characteristics summarized in Table 1. Increases in the probability of malignancy are expressed by assigning either subcategory, 4A (5%–15%), 4B (>15%), or 4X (additional finding such as spiculation or enlarged lymph nodes). The average follow-up time of subjects with suspicious nodules was 1.6 ± 0.5 years after the baseline LDCT.
Among 1975 screened subjects, 72.8% (1438/1975) of the screened subjects were never-smokers, 16.5% (326/1975) were current smokers, and 10.7% (211/1975) were former smokers. Only 7.5% (149/1975) of the study subjects would have been eligible for screening based on the NLST enrollment criteria. Among 1975 screened subjects, there were 27 subjects diagnosed with non-smoking related lung cancer (two lung cancer subjects with smoking were excluded). Definition of non-smoking related lung cancer was defined as the lung adenocarcinoma spectrum such as adenocarcinoma in situ, minimally invasive adenocarcinoma, and invasive adenocarcinoma diagnosed by surgical or biopsy proof. Histopathologic diagnosis of atypical adenomatous hyperplasia was excluded from this study.
Covariate and propensity score matching
All the subjects were divided to two groups: the group with family history of lung cancer (398 subjects) and the group without family history of lung cancer (1577 subjects). However, 87 patients were excluded because of missing data on BMI profiles. We used a 1:1 propensity score-matched pair method combined with covariate adjustment to analyze patients with and without family history of lung cancer shown in Fig. 1. The unbalanced conditions at baseline between the two groups were controlled by using PS matching with covariate adjustment. The 1:1 PS matching yielded matched pairs of 392 subjects with family history of lung cancer and 392 patients without family history of lung cancer, resulting in no differences in age, gender, BMI, and the proportion of other cancers of family history.
LDCT imaging acquisition and interpretation
All scans were performed with a 16-slice multi-detector CT (Somatom Sensation 16, Siemens Healthcare, Erlangen, Germany) and a 64-slice multi-detector CT (Aquilion 64; Toshiba Medical Systems) from the lung apex to the base without contrast enhancement. The LDCT examination protocols met the CMS (Centers for Medicare & Medicaid Services) requirement of the volume CT dose index (CTDIvol) ≤ 3.0 milligray (mGy) for standard-size patients based on recommendations of the ACR and Society of Thoracic Imaging for different vendors setting . Scans were obtained with the subjects in supine position at end inspiration. The data were reconstructed with filtered back projection, a slice thickness of 2 mm, and an increment of 2 mm, using a smooth convolution kernel (Siemens B30f, Toshiba FC02). All studies were evaluated on lung and mediastinal windows on a picture-archiving and communication system and reported by two experienced thoracic radiologists with 8 and 12 years of experience, respectively.
Statistical analysis was performed using SPSS® v17.0 for Windows (SPSS, Inc., Chicago, IL) and the SAS® software package (SAS Institute, Inc., Cary, NC). To minimize the effect of potential confounders on selection bias, propensity scores were generated by using the multiple logistic regressions to estimate the probability that subjects have family history of lung cancer or not. The covariates entered into the propensity score were age, gender, and BMI. Propensity score matching (1:1 match) was performed to adjust for differences in baseline clinical characteristics, yielding a total of 784 subjects: 392 subjects with family history of lung cancer and 392 subjects without family history of lung cancer (SAS Institute, Inc., Cary, NC).
Baseline characteristics were performed as mean ± standard deviation (SD).
Comparisons between the two groups were performed by using the independent T-test for continuous data and chi-square test for categorical data before and after PS matching. The Fisher exact chi-square test was used to analyze when the smallest expected value is less than 5. Multiple logistic regression models were developed, and odds ratios (ORs) were used to evaluate risk factors associated with lung cancer. Data analysis was performed using SPSS® v17.0 for Windows (SPSS, Inc., Chicago, IL).
We retrospectively review the medical record of 1975 asymptomatic healthy subjects (40 ~ 80 years old) who voluntarily underwent low-dose chest CT (1083 males, 892 females) from August 2013 to October 2014. We identified 398 patients with family history of lung cancer while the other 1577 patients without family history of lung cancer shown in Fig. 1. The baseline characteristics in the pre-match and post-match cohorts are presented in Table 2.
Baseline characteristics before propensity matching
Patients were significantly younger in the family history of lung cancer (+) group compared with the family history of lung cancer (−) group (56.1 ± 9.39 years old versus 58.39 ± 7.05 years old); the BMI in the family history of lung cancer (+) group is lower compared with the family history of lung cancer (−) group (23.76 ± 3.36 kg/m2 versus 24.46 ± 3.50 kg/m2); there were more nodules in the family history of lung cancer (+) group compared with the family history of lung cancer (−) group (1.09 ± 1.53 versus 0.51 ± 1.027). There were several parameters of baseline characteristics statistically higher in the family history of lung cancer (+) group, including the percentage of female gender (65.9% vs. 39.95%), the percentage of category 4 lesions (5.27% vs. 2.02%), the percentage of family history of other cancers (36.1% vs. 30.2%), and the percentage of lung cancer (3.76% vs. 0.76%).
Among 27 subjects with non-smoking related lung cancer diagnosed, 8 (29.62%) subjects had a diagnosis of synchronous multiple primary lung cancers (MPLCs) according to the diagnostic criteria proposed by Martini and Melamed before propensity score matching . Among 20 (35%) subjects with non-smoking related lung cancer diagnosed, 7 subjects had a diagnosis of synchronous MPLCs according to the diagnostic criteria proposed by Martini and Melamed after propensity score matching .
To further investigate this imbalance, we illustrate histogram of the distribution of the propensity score for both groups before and after propensity matching. Figure 2a presents histograms of unbalanced propensity score distribution for both groups before propensity matching. Figure 2b presents histograms of balanced propensity score distribution for both groups after the propensity matching.
Baseline characteristics after propensity matching
According to the propensity score matching 1:1 shown in Table 2 , 392 patients in the family history of lung cancer (+) group were matched with 392 in the family history of lung cancer (−) group. The matching process eliminated some significant differences that existed between the family history of lung cancer (+) group and the family history of lung cancer (−) group such as age, sex, BMI, the percentage of family history of other cancers, and category 4 lesions, while the nodule numbers and the percentage of lung cancer remained significant different.
Univariate and multivariate logistic regression analysis for lung cancer risk
Table 3 lists the univariate and multivariate logistic regression analyses to determine the predictors of lung cancer. Female gender (univariate model: OR = 10.149, 95% confidence interval (CI) = 1.351–76.227; multivariate model: OR = 11.199, 95% CI = 1.444–86.862), nodule number (univariate model: OR = 1.353, 95% CI = 1.114–1.642; multivariate model: OR = 1.309, 95% CI = 1.066–1.607), and family history of lung cancer (univariate model: OR = 3.08,95% CI = 1.108–8.557; multivariate model: OR = 2.831, 95% CI = 1.000136–8.015) were significant associated with lung cancer both on univariate and multivariate analysis.
In this retrospective analysis applying propensity score matching in order to minimize confounding effects and selection bias to estimate the true causal effect, we demonstrated three major findings. The first one is that to utilize the propensity score matching to adjust for selection bias could address the balanced baseline characteristics between exposure and control subjects and improve the internal validity of the study. The second finding is family history of lung cancer and female gender were significantly associated with lung cancer based on univariate or multivariate logistic regression. Previous studies have addressed the issue that family history of lung cancer significantly association with non-smoking related lung cancer, mainly in middle-age women of Asian population. However, these results were based on data available from previous case-control or retrospective cohort studies which more susceptible to the effects of selection bias [7, 8]. The present study demonstrated for the first time that identification two important associated risk factors with lung cancer in an Asian cohort with less smoker using a propensity score matching method to construct quasi-experimental design intended to stimulate randomized controlled trial (RCT) design and minimize the selection bias . Familial risk of lung cancer is attributable to share more complex genetic and environmental factors [18,19,20]. Our study demonstrated that familial history of lung cancer significantly associated with non-smoking related lung cancer, especially in women. In addition, another study demonstrated that women with a history of lung infection (bronchitis or pneumonia) positively influenced lung cancer development . The third finding, increasing numbers of nodules were significantly associated with lung cancer in an Asian population, mainly non-smoker. The reported incidence of synchronous MPLCs in patients with lung cancer in our study is high up to 35% (one example case shown in Fig. 3). The incidence of synchronous MPLC has been reported to range from 0.7% to 30% of patients with lung cancer in the previous literature reviews [8, 22,23,24]. This study result support that high prevalence of Multifocal ground glass/lepidic (GG/L) lung cancer, a kind of lung adenocarcinoma subtype which often occurred in Asian women or non-smoker recently proposed by the International Association for the Study of Lung Cancer (IASLC) Lung Cancer Staging Project in 2016 . In addition, the overall lung cancer prevalence rate was 1.40% (27/1975) in this study cohort. Our results are congruent with other published data from Asian population in the group of non-smokers or lesser smokers (lung cancer prevalence rate 1 ~ 2% at the baseline LDCT screening) [25, 26]. Our study population consists mainly of non-smokers, which is very different from the NLST and other LDCT lung cancer screening studies conducted outside of Asia [3, 27]. Recent studies have investigated more detail about the diagnosis, management and prognosis of Multifocal GG/L lung cancer [28,29,30]. This issue should be more emphasized in Asian lung cancer screening program due to high prevalence of synchronous MPLC reported according to previous and the current studies [8, 22].
There are several limitations to our study. First, propensity score matching can only control for observed covariates such as age, BMI or sex in the study. However, any unobserved covariates (cooking, second-hand smoking and air pollution) cannot be adjusted to balancing baseline characteristics between exposure and unexposed with reducing selection bias . Second, propensity score matching methods resulted in throwing out over half of the subjects in the unexposed group, reducing the overall sample size and negatively affecting statistical power. To maximize our statistical power to detect this effect, it is mandatory to perform a much larger cohort in an Asian population. Third, a large number of subjects are eliminated after propensity scoring matching because of limited numbers within the exposure group despite the algorithm of full matching. Thus further large cohort studies are needed to establish generalizability of these study results because of the loss of study subjects numbers threatening external validity.
In conclusion, in this retrospective analysis applying propensity score matching in order to minimize confounding effects and identify two important risk factors of female gender and family history of lung cancer for non-smoker lung cancer prediction. In the future, risk-based prediction model based on the family history of lung cancer and female gender can potentially improve efficiency of lung cancer screening programs in Taiwan.
American College of Radiology
Body mass index
Centers for Medicare & Medicaid Services
Volume CT dose index
International Association for the Study of Lung Cancer
Low-dose computed tomography
Lung Imaging Reporting and Data System
Multiple primary lung cancers
National Lung Screening Trial
Randomized controlled trial
Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA Cancer J Clin. 2013;63(1):11–30.
Siegel R, Ma J, Zou Z, Jemal A. Cancer statistics, 2014. CA Cancer J Clin. 2014;64(1):9–29.
National Lung Screening Trial Research T, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395–409.
Samet JM, Avila-Tang E, Boffetta P, Hannan LM, Olivo-Marston S, Thun MJ, et al. Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clinical cancer research: an official journal of the American Association for Cancer Research. 2009;15(18):5626–45.
Wakelee HA, Chang ET, Gomez SL, Keegan TH, Feskanich D, Clarke CA, et al. Lung cancer incidence in never smokers. J Clin Oncol Off J Am Soc Clin Oncol. 2007;25(5):472–8.
Ko YC, Lee CH, Chen MJ, Huang CC, Chang WY, Lin HJ, et al. Risk factors for primary lung cancer among non-smoking women in Taiwan. Int J Epidemiol. 1997;26(1):24–31.
Lo Y-L, Hsiao C-F, Chang G-C, Tsai Y-H, Huang M-S, Su W-C, et al. Risk factors for primary lung cancer among never smokers by gender in a matched case–control study. Cancer Causes Control. 2013;24(3):567–76.
Wu F-Z, Huang Y-L, Wu CC, Tang E-K, Chen C-S, Mar G-Y, et al. Assessment of selection criteria for low-dose lung screening CT among Asian ethnic groups in Taiwan: from mass screening to specific risk-based screening for non-smoker lung cancer. Clinical Lung Cancer. 2016;17:e45–56.
Detterbeck FC, Nicholson AG, Franklin WA, Marom EM, Travis WD, Girard N, et al. The IASLC lung cancer staging project: summary of proposals for revisions of the classification of lung cancers with multiple pulmonary sites of involvement in the forthcoming eighth edition of the TNM classification. J Thorac Oncol. 2016;11(5):639–50.
Subramanian J, Govindan R. Lung cancer in never smokers: a review. J Clin Oncol Off J Am Soc Clin Oncol. 2007;25(5):561–70.
Wu X, Wen CP, Ye Y, Tsai M, Wen C, Roth JA, et al. Personalized risk assessment in never, light, and heavy smokers in a prospective cohort in Taiwan. Sci Rep. 2016;6:36482.
Elze MC, Gregson J, Baber U, Williamson E, Sartori S, Mehran R, et al. Comparison of propensity score methods and covariate adjustment. Evaluation in 4 cardiovascular studies. J Am Coll Cardiol. 2017;69(3):345–57.
McKee BJ, Regis SM, McKee AB, Flacke S, Wald C. Performance of ACR lung-RADS in a clinical CT lung screening program. J Am Coll Radiol. 2015;12(3):273–6.
Pinsky PF, Gierada DS, Black W, Munden R, Nath H, Aberle D, et al. Performance of lung-RADS in the National Lung Screening Trial: a retrospective assessment. Ann Intern Med. 2015;162(7):485–91.
Kazerooni EA, Austin JH, Black WC, Dyer DS, Hazelton TR, Leung AN, MF MN-G, et al. ACR-STR practice parameter for the performance and reporting of lung cancer screening thoracic computed tomography (CT): 2014 (resolution 4). J Thorac Imaging. 2014;29(5):310–6.
Martini N, Melamed MR. Multiple primary lung cancers. J Thorac Cardiovasc Surg. 1975;70(4):606–12.
Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424.
Kanwal M, Ding X-J, Cao Y. Familial risk for lung cancer. Oncol Lett. 2017;13(2):535–42.
Hemminki K, Dong C, Vaittinen P. Cancer risks to spouses and offspring in the family-cancer database. Genet Epidemiol. 2001;20(2):247–57.
Hemminki K, Li X. Familial risk for lung cancer by histology and age of onset: evidence for recessive inheritance. Exp Lung Res. 2005;31(2):205–15.
Osann KE. Lung cancer in women: the importance of smoking, family history of cancer, and medical history of respiratory disease. Cancer Res. 1991;51(18):4893–7.
van Rens MT, Zanen P, Brutel de La Riviere A, Elbers HR, van Swieten HA, van Den Bosch JM. Survival in synchronous vs. single lung cancer: upstaging better reflects prognosis. Chest. 2000;118(4):952–8.
Ferguson MK, DeMeester TR, DesLauriers J, Little AG, Piraux M, Golomb H. Diagnosis and management of synchronous lung cancers. J Thorac Cardiovasc Surg. 1985;89(3):378–85.
Lam S, MacAulay C, Palcic B. Detection and localization of early lung cancer by imaging techniques. Chest. 1993;103(1 Suppl):12s–4s.
Yi CA, Lee KS, Shin M-H, Cho YY, Choi Y-H, Kwon OJ, et al. Low-dose CT screening in an Asian population with diverse risk for lung cancer: a retrospective cohort study. Eur Radiol. 2015;25(8):2335–45.
Chen C-Y, Chen C-H, Shen T-C, Cheng W-C, Hsu C-N, Liao C-H, et al. Lung cancer screening with low-dose computed tomography: experiences from a tertiary hospital in Taiwan. J Formos Med Assoc. 2016;115(3):163–70.
Fintelmann FJ, Bernheim A, Digumarthy SR, Lennes IT, Kalra MK, Gilman MD, et al. The 10 pillars of lung cancer screening: rationale and logistics of a lung cancer screening program. Radiographics. 2015;35(7):1893–908.
Zhang Z, Gao S, Mao Y, Mu J, Xue Q, Feng X, He J. Surgical outcomes of synchronous multiple primary non-small cell lung cancers. Sci Rep. 2016;6:23252.
Xue X, Liu Y, Pan L, Wang Y, Wang K, Zhang M, et al. Diagnosis of multiple primary lung cancer: a systematic review. J Int Med Res. 2013;41(6):1779–87.
Tanvetyanon T, Finley DJ, Fabian T, Riquet M, Voltolini L, Kocaturk C, et al. Prognostic Nomogram to predict survival after surgery for synchronous multiple lung cancers in multiple lobes. J Thorac Oncol. 2015;10(2):338–45.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
The authors thank all doctors who responded to our screening and investigation, and appreciate Miss Taso Shu-Ping for the technical help and preparing this manuscript. This study was supported by a grant from Kaohsiung Veterans General Hospital, VGHKS103-015, VGHKS104-048, VGHKS105-064, Taiwan, R.O.C.
This study was supported by Grants from Kaohsiung Veterans General Hospital, VGHKS103–015, VGHKS104–048, VGHKS105–064, Taiwan, R.O.C.
Availability of data and materials
All results are available at the Kaohsiung Veterans General Hospital. The database used for the study can be available from the corresponding author under demand if needed.
Ethics approval and consent to participate
The Kaohsiung Veterans General Hospital Institutional Review Board approved the study and waived the requirement for informed consent due to the retrospective nature of this study.
Consent for publication
None of the authors have a conflict of interest to declare in relation to this work.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lin, KF., Wu, HF., Huang, WC. et al. Propensity score analysis of lung cancer risk in a population with high prevalence of non-smoking related lung cancer. BMC Pulm Med 17, 120 (2017). https://doi.org/10.1186/s12890-017-0465-8
- Non-smoker lung cancer
- Propensity score matching
- Lung adenocarcinoma spectrum
- Risk factor