Evaluation of the global lung function initiative 2012 reference values for spirometry in a Swedish population sample

Background The Global Lung Function Initiative 2012 (GLI) reference values are currently endorsed by several respiratory societies but evaluations of applicability for adults resident in European countries are lacking. The aim of this study was to evaluate if the GLI reference values are appropriate for an adult Caucasian Swedish population. Methods During 2008–2013, clinical examinations including spirometry were performed on general population samples in northern Sweden, in which 501 healthy Caucasian non-smokers were identified. Predicted GLI reference values and Z-scores were calculated for each healthy non-smoking subject and the distributions and mean values for FEV1, FVC and the FEV1/FVC ratio were examined. The prevalence of airway obstruction among these healthy non-smokers was calculated based on the Lower Limit of normal (LLN) criterion (lower fifth percentile) for the FEV1/FVC ratio. Thus, by definition, a prevalence of 5% was expected. Results The Z-scores for FEV1, FVC and FEV1/FVC were reasonably, although not perfectly, normally distributed, but not centred on zero. Both predicted FEV1 and, in particular, FVC were lower compared to the observed values in the sample. The deviations were greater among women compared to men. The prevalence of airway obstruction based on the LLN criterion for the FEV1/FVC ratio was 9.4% among women and 2.7% among men. Conclusions The use of the GLI reference values may produce biased prevalence estimates of airway obstruction in Sweden, especially among women. These results demonstrate the importance of validating the GLI reference values in different countries. Electronic supplementary material The online version of this article (doi:10.1186/s12890-015-0022-2) contains supplementary material, which is available to authorized users.


Background
Reference values for spirometry are necessary for identifying subjects with abnormal lung function. The European Coal and Steel Community (ECSC) reference values [1] have until recently been recommended for European countries by the European Respiratory Society (ERS). In Sweden, two domestic reference values have been widely used [2][3][4]. In 2012, The Global Lung Function Initiative (GLI), an ERS task force, presented new multi-ethnic reference values for spirometry [5] for several different ethnicities within the three to 95 years agespan. These GLI reference values are currently endorsed by several respiratory societies [5,6]. For Caucasians, the GLI reference values are based on data from asymptomatic lifelong non-smokers from 30 different centres comprising 57,395 subjects with European ancestry from several European countries including Sweden, along with Israel, Australia, USA, Canada, Brazil, Chile, Mexico, Uruguay, Venezuela, Algeria and Tunisia. They have been evaluated and found to be applicable for the Australasian population aged 4-80 years [7] as well as for British children [8], but do not reflect data for Tunisian adults very well [9]. Further evaluations of applicability from other parts of the world are required in order to verify the appropriateness in these areas. Hitherto, there are no publications evaluating the applicability of the GLI reference values for Caucasian adult residents in any of the European countries.
The definition of airway obstruction is based on the ratio between forced expiratory volume in one second (FEV 1 ) and the vital capacity (VC) measured by slow (SVC) and/or forced (FVC) manoeuvres. According to the Global Initiative on Obstructive Lung Disease (GOLD) a post-bronchodilator ratio < 0.7 is defined as not fully reversible airway obstruction [10]. The ERS and American Thoracic Society (ATS) recommend the use of Lower Limit of Normal (LLN) defined as the lower fifth percentile of the distribution for healthy nonsmokers to define an abnormally low ratio [11][12][13]. The LLN definition is dependent on the set of reference values in use. Consequently, when implementing the LLN criteria in a population of healthy non-smoking subjects, a prevalence of obstruction of 5% indicates perfect applicability of the reference values in use.
It is of great importance that the population from which the reference values are derived is representative for the population under study. The age distribution and other anthropometric, ethnic, environmental and socioeconomic factors should be equivalent since such factors can affect lung function. Additionally, the methodology for performing spirometric measurements in terms of protocol and equipment etc. must be stringent [11,14].
The aim of the present study was to evaluate if the GLI reference values, although endorsed by several respiratory societies including the ERS and ATS, are applicable for an adult Caucasian population resident in Sweden.

Study design and reference population
The study sample was recruited from the Obstructive Lung Disease in Northern Sweden (OLIN) Studies populationbased cohorts. In 2006, a randomly selected cohort in ages 20-69 years was invited to a postal questionnaire survey (n = 7,997) together with a follow-up of a previously recruited randomly selected cohort aged 30-84 years in 2006 (n = 7,004). Of the responders (n = 12,055, 80.4%), 1016 subjects were randomly selected after stratification reflecting the age and sex distribution of the general population in the study area. They were invited to examinations including structured interviews and lung function tests in which 726 subjects (71.5%) participated. In order to obtain a sufficient number of healthy non-smokers, an additional sample of 738 healthy non-smokers according to the 2006 questionnaire survey were also invited to the examinations and 448 (60.7%) participated. All examinations were performed during 2008-2013, and, in total, 501 Caucasians (49% women) were identified as healthy non-smokers with acceptable spirometry quality and constitute the reference population.
Healthy non-smokers [15] were defined as subjects without a history of any airway or lung disease, breathlessness, cough, wheeze, ischemic heart disease, rheumatic disorders or a previous life-time exposure of > one pack-year of smoking. Their characteristics are described in Table 1. Eligibility criteria are presented in an appendix [see Additional file 1]. The age was calculated by one decimal point as the difference between date of birth and date of examination. Height was measured in stocking feet with an accurate stadiometer with 0.5 cm precision. Weight was measured with 0.5 kg precision without jacket and shoes and with empty pockets. Written informed consent for participation in the study was obtained from the participants and the study was approved by the Regional Ethical Review Board at Umeå University, Sweden.

Spirometric measurements
Spirometric measurements included FEV 1 , FVC and SVC performed on two Jaeger Masterscope spirometers (JLAB version 5.21, CareFusion, Würzburg, Germany). The spirometers were calibrated each morning, and a minimum of three and a maximum of eight measurements were performed per subject. The procedures were performed without use of bronchodilators and following the ATS/ERS recommendations [16] but with a repeatability criterion of ≤5% deviation from the second highest value [17]. Once the data had been exported from the spirometers to an electronic data file, data validity controls were performed. The repeatability criterion was not met for 2% of the measurements and each such measurement was thoroughly examined by post-hoc ocular control of flow-volume charts, resulting in the exclusion of two subjects.

Statistical analysis
The GLI reference values are based on pre-bronchodilator values, and only pre-bronchodilator values from the reference population were collected and analysed. Using the Excel macro for GLI [18], reference values, lower limit of normal (LLN), Z-scores and percentiles for FEV 1 , FVC and the FEV 1 /FVC ratio were calculated for each subject in the reference population. The GLI Z-score is a standardized measure of the positioning of an observed measurement in the distribution of the population from which the GLI reference values are derived and takes both between-subject and age-and height-related variability into account. If the agreement between the observed values in the reference population and the GLI reference values is perfect, the mean Z-scores should ideally be zero, and the standard deviation (SD) should be one [19]. Mean values and standard deviations were calculated, and Q-Q plots were scrutinized to determine if the Z-scores were normally distributed. Possible relationships between Z-scores and age, height, weight and sex were examined by multiple linear regression models. For comparison, a number of other commonly used reference values were also applied on the reference population, i.e. ECSC [1] and Hankinson [20], and from Scandinavia, Berglund [2], Hedenström [3,4], Langhammer [21], Gulsvik [22] and Viljanen [23]. Mean observed measurements, expressed as percent of predicted, were calculated across the different sets of reference values.

Results
The distribution of GLI Z-scores When applying the GLI reference values [5] on the reference population, the Z-scores for FEV 1 , FVC and FEV 1 /FVC were reasonably, although not perfectly, normally distributed, but not centred around zero. The mean Z-score was 0.21 (SD 0.91) for FEV 1 , 0.35 (SD 0.92) for FVC and −0.25 (SD 0.85) for the FEV 1 /FVC ratio and differed significantly from zero for all three Table 2 Mean GLI Z-scores for FEV 1 , FVC and the FEV 1 /FVC ratio by age group and sex  Table 2]. Mean Z-scores deviated significantly from zero in the same age groups as where mean percent of predicted significantly deviated from 100% as displayed in Figures 1 and 2.
For FEV 1 , 93.6% (94.3% among women, 93.0% among men) of the 501 observations were within the 90% limits of normality. For FVC, the corresponding proportion was 90.2% (89.3% among women, 91.1% among men), and for the FEV 1 /FVC ratio, the proportion was 92.2% (88.5% among women, 95.7% among men). However, only 2.0% of the women and 1.6% of the men had FEV 1 /FVC values above the 95th percentile. The pattern of Z-scores was the opposite for FEV 1 and FVC. For FEV 1 , 0.4% of the women and 1.2% of the men had values below the 5th percentile, and 5.3% of the women and 5.8% of the men above. For FVC, 0.8% of the women and 1.2% of the men had values below the  fifth percentile, and 9.8% of the women and 7.8% of the men above. The GLI percentile frequency distributions for FEV 1 , FVC and the FEV 1 /FVC ratio were shifted, particularly among women, towards higher percentiles for FEV 1 and FVC, and towards lower percentiles for FEV 1 /FVC [ Figure 3].

Factors related to GLI Z-scores
When analysing the Z-scores for FEV 1 , FVC and the FEV 1 /FVC ratio in relation to age, height, weight and sex, small but statistically significant associations yielding R-square values of 1-2% were found. Sex had a significant impact on the Z-score for the FEV 1 /FVC ratio (Beta-coefficient 0.256 (p = 0.001)), sex and height had a significant impact on the Z-score for FEV 1 (Beta-coefficient 0.276 (p = 0.023) for sex, −0.016 (p = 0.005) for height), and weight had a significant impact on the FVC Z-score (Beta-coefficient −0.009 (p = 0.001)).

Prevalence of airway obstruction according to LLN based on GLI reference values
The prevalence of spirometrically defined airway obstruction in the reference population according to LLN for the FEV 1 /FVC ratio was 9.4% (95% CI 5.7%-13.1%) among women and 2.7% (95% CI 0.7%-4.7%) among men (p-value = 0.002). When applying the lower 2.5th GLI percentile as LLN; the prevalence of obstruction in the reference population was 3.4% (95% CI 1.1%-5.7%) among women and 1.5% (95% CI 0.0%-3.0%) among men. Figure 4. illustrates that subjects defined as having obstruction according to the LLN criterion tended to have Z-scores for FEV 1 below rather than above zero and Z-scores for FVC above rather than below zero.

Comparison to other reference values
When other sets of reference values were applied to the reference population, the mean FEV 1 and FVC as percent of predicted significantly exceed 100% for both sexes according to ECSC, Hankinson and Berglund. Mean FEV 1 and FVC as percent of predicted for reference values Hedenström (Sweden), Langhammer (Norway), Gulsvik (Norway) and Viljanen (Finland) were closer to 100%. In general, the FVC percent of predicted values were more overestimated than the FEV 1 values for all sets of reference values, and particularly so among women. Consequently, FEV 1 / FVC and FEV 1 /VC as percent of predicted were generally below 100%, and more pronounced so among women compared to men [ Figure 5].

Discussion
Compared to the ECSC reference values, the GLI reference values are superior, but not perfect, for Swedish adults. The original intention of GLI was that the same reference values should be possible to use in most parts of the world, covering different ethnicities and ages to avoid age-related junction points between different sets of reference values. The data which the GLI equations are based on were collected from 1978 to 2008, which may question whether or not the oldest data still are valid. However, earlier studies by Quanjer et al. found no evidence of impact of secular trends in FEV 1 , FVC or FEV 1 /FVC in Caucasians during the last 30 years. They also found that reference equations derived from collated datasets, such as the GLI, are applicable across different centres using different equipment, which is another strong argument for using GLI [24].
The GLI reference values represent the average of all available data they are based on and may thus not be representative for every specific subpopulation included. Since there are substantial differences in e.g. occupational exposures and environmental pollution which may affect lung function between countries and regions populated by Caucasians, differences in lung function can be expected. Data from Swedish centres are included in the reference data from which the GLI reference values are derived, but comprise only 123 subjects. Since there are substantial differences in anthropometric, environmental and socio-economic factors between e.g. Scandinavia and southern Europe, an evaluation of the fit for Swedish subjects is required.
Swanney et al. [25] argues that adopting the GLI reference values in clinical practice worldwide is essential and urgent, in order to reduce the confusion regarding which reference values to rely on. In essence, Swanney et al. argues that the use of GLI worldwide is preferable to local specific reference values obtained with different techniques, especially since the GLI reference values have been evaluated and considered applicable for both Caucasian adults and children [7,8]. Similar matters have also been argued previously by Stanojevic et al. [14]. However, despite the fact that the GLI reference values may be applicable for Caucasian populations in several countries, the present findings demonstrate that there are differences between countries that have to be considered.
The OLIN-studies have conducted research about obstructive lung disease in Northern Sweden since 1985 [26] and the research staff carrying out the spirometric measurements are highly experienced. The sampling of the reference population was rigorously thorough, as was the data quality and repeatability control. The reference population originates from randomly selected healthy non-smokers of the general population of Norrbotten, the northernmost province of Sweden. Selection bias such as using health personnel [2], employees within certain industries [1,2,23] or subjects visiting a certain clinic is thus avoided. One of the strengths of this study is that data is contemporary, i.e. collected from 2008 to 2013, and thus possible secular trends in this data set can be ruled out. It has previously been shown that 150 subjects of each sex is a sufficient sample size to make a reliable evaluation of the applicability of reference values for spirometry [24], and hence this evaluation can be considered convincingly reliable. A possible weakness of this study is that no data of cotinine levels were analysed to confirm non-smoking.
This Swedish study showed a positive offset for observed FEV 1 and FVC compared to the GLI reference values, with mean Z-scores for FEV 1 and FVC above the expected for both sexes and across almost all ages. Mean values of FEV 1 and, in particular, FVC as percent of predicted values exceeded 100% to a greater extent among women than among men. In this study, FVC as percent of predicted value exceeded 100% also according to most of the reference values from other areas, i.e. reference values ECSC [1], Hankinson [20], Langhammer [21] and Gulsvik [22]. The GLI reference values yield similar results as Hankinson's, with percent of predicted values closer to 100% compared to the previously recommended ECSC reference values, in line with results from previous studies [6]. Reference values from Sweden [3,4] and Finland [23] yielded mean percent of predicted values closer to 100%. However, recent debate criticise the use of percent predicted due to the sex-, height-and age-related bias embedded in this measure, and advocates the use of Z-scores instead [6,27].
The standard deviations for FEV 1 and FVC Zscores were close to 0.9 for both sexes, implying that the dispersion around the mean was lower in this sample compared to the GLI. Consequently the LLN for these values may be "too low". Almost 10% of the subjects were outside the 90% limits of normality as defined by GLI (6.4% for FEV 1 , 9.8% for FVC), but most of those subjects were located above the 95th percentile. The authors of the study which evaluated the applicability of GLI on an Australasian population argue that Z-score deviations <0.5 (corresponding to <3% deviations) are clinically insignificant [7]. In this study however, the deviation of 0.42 Z-scores for FVC among women represent a deviation of 6%. The classification into severity grades of airway and lung disease often relies on FEV 1 or FVC as percent of a reference value, and thus the use of GLI may lead to invalid classification of disease severity in Sweden.
The mean predicted FEV 1 /FVC ratio was higher compared to the mean observed ratio, and more pronounced so among women compared to men. The Z-score SD's for both sexes were consistently below 0.9 for the ratio, implying a lower variability in this Swedish dataset also for the ratio. Since the spirometric definition of airway obstruction relies on the ratio, the fact that the GLI predicted ratios are higher means that the prevalence of obstruction may be overestimated in Sweden. Additionally, since the dispersion around the ratio is lower in Sweden compared to GLI, use of the GLI LLN criteria may overestimate the prevalence of obstruction even further. LLN will by definition allow for a 1/20 false positive rate, and this study clearly illustrates that among healthy subjects, those identified as obstructive by the LLN criterion in particular are those with high FVC values.
Regardless of criteria for airway obstruction, the prevalence was higher among women than men in the reference population. If the agreement with GLI is perfect, no such sex-difference should exist when applying the LLN criteria of obstruction based on the GLI reference values. In this study, 9.4% of the women were identified as obstructive according to the GLI LLN criterion (fifth percentile), which indicates that this criterion may overestimate the prevalence of airway obstruction in Swedish women.

Conclusions
In conclusion; the GLI reference values are preferable compared to the ECSC for Swedish adults. However, among non-smoking healthy men and women in northern Sweden, the mean values of FEV 1 were somewhat larger compared to those in the non-smoking healthy GLI reference population. A greater discrepancy was found for FVC, especially among women. The use of the LLN criterion for airway obstruction based on the GLI reference values for the FEV 1 /FVC ratio may produce biased prevalence estimates of airway obstruction in Sweden, in particular among women. These results demonstrate the importance of validating the GLI reference values in different countries.