Validity of EQ-5D utility index and minimal clinically important difference estimation among patients with chronic obstructive pulmonary disease

Background The discriminatory ability of multi-attribute utility (MAU) measures compared to condition-specific measures (CSM) in assessing health-related quality of life (HRQoL) among patients with chronic obstructive pulmonary disease (COPD) is an unsettled issue. This study investigated the quality of life of patients with COPD with three different HRQoL instruments and examined whether they could differentiate between adjacent severity groups in a statistically and clinically meaningful manner. In the process, the minimal clinically important differences (MCID) of the EQ-5D utility index were estimated. Methods Cross-sectional survey data were collected from patients with mild to very severe COPD in South Korea. In addition to demographic and clinical information, the following HRQoL questionnaires were used: The three-level five-dimensional Euro-Quality of Life tool (EQ-5D-3L), the EQ-Visual Analog Scale (EQ-VAS), and the Chronic Obstructive Pulmonary Disease Assessment Test (CAT). Patients’ health-related quality of life was analyzed with reference to severity groups based on the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. To investigate the discriminatory ability of the HRQoL instruments between COPD severity groups, tests examining variance, covariance, and standardized mean difference were performed. After estimating the MCID of the EQ-5D utility index using the anchor-based method, we investigated whether the differences in the EQ-5D utility scores between groups exceeded the clinically meaningful minimum level. Results A total of 298 patients completed this study. All the quality of life scores showed statistically significant differences between the GOLD severity groups. The pooled MCID estimate for the EQ-5D utility index was 0.028 (range: 0.017–0.033). Even after adjusting for other factors affecting quality of life, the EQ-5D utility index differentiated the GOLD groups well. Conclusions We conclude that the EQ-5D utility index is a valid instrument for measuring the quality of life of patients with COPD, and the pooled MCID estimate for the EQ-5D utility index was 0.028.


Background
Chronic obstructive pulmonary disease (COPD) is characterized by airflow limitation, and it affects the quality of life of patients owing to various symptoms, acute exacerbation, and complications. There is no cure for COPD, therefore, the purpose of pharmacological and nonpharmacologic treatments is to mitigate symptoms and improve quality of life [1,2]. Therefore, the healthrelated quality of life (HRQoL) of patients with COPD has been considered an important disease outcome in recent clinical studies [3,4]. Tools assessing HRQoL should be able to differentiate it according to the patient's clinical status [5]. The progression of COPD in the utility-based decision analytic model is mainly based on discrete clinical stages or continuous changes in pulmonary function [6,7]. As a representative staging system, the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification uses spirometry airflow limitations measured with the forced expiratory volume per second (FEV 1 ). FEV 1 is the most useful predictor of clinical outcomes such as mortality and hospitalization rates, and it provides important information on the clinical status of patients at the population level. However, its association with symptoms or patients' quality of life is somewhat weak [8,9]. Pickard et al. [10] and Einarson et al. [11] reported that the utility measure reflects the clinical severity of COPD well, as evidenced by lower utility scores when disease severity is higher. However, many other studies have reported that the discriminatory ability of a multi-attribute utility (MAU) measure, such as the five-dimensional EuroQol (EQ-5D) utility index, is limited with respect to measuring the quality of life of patients with COPD [4,[10][11][12][13][14]. Therefore, there is an increasing tendency to use condition-specific measures (CSMs), which are more sensitive to the detection of small treatment effects. CSMs that are designed to capture the clinical consequences or small changes in a certain disease are preferred for clinical studies. However, as such measures are not preference based, their use in economic evaluation is extremely limited [15]. Among the existing preference-based utility measures, the EQ-5D utility index is the most preferred tool based on the multiattribute utility theory (MAUT) because its valuation studies have been conducted in many countries. It is necessary to examine the discrimination ability of the EQ-5D utility index in evaluating the quality of life of patients with COPD. This differentiation is not only important in the statistical sense but also with respect to the clinical meaning of the results. Recently, as Patient Reported Outcomes (PROs) have come to play an important role in the evaluation of treatments or interventions, interest in minimal clinically important differences (MCID), which facilitate the interpretation of PRO scores, has been growing. However, though MCID has been studied with reference to several disease areas, there is no consensus on the MCID of the EQ-5D utility index as yet [16].
This study aims to investigate whether the EQ-5D utility index is a valid tool for assessing the quality of life of patients with COPD. To this end, we examined the quality of life of patients with COPD using data from a cross-sectional patient survey. Additionally, we estimated the MCID of the EQ-5D utility index using various established methods. We then tried to determine whether the EQ-5D utility index usefully differentiates statistically and clinically between the severity groups.

Study subjects
A multicenter, non-interventional, cross-sectional study was conducted from August to December 2014, at the pulmonary division of three educational hospitals in Seoul, South Korea. The study protocol was approved by the Institutional Review Board (IRB) at each hospital (approval numbers: ED14144; KUGH14146; KC14QIMI0470) and all patients were informed about the study and consented to participate by signing the form recommended by the IRBs. Study subjects had mild to very severe COPD diagnosed before 1 January 2013. The inclusion criteria were age over 40 years, FEV 1 /forced vital capacity (FVC) ratio of less than 70% after bronchodilator administration, and previous or current smoking history of 10 or more packyears [1,12]. The exclusion criteria were exacerbation cases occurring within 6 weeks or cardiovascular events within 3 months, or participation in other clinical trials since January 2013. By excluding the effects of acute events, it is possible to pursue clinical stability of the patient, which is beneficial for measuring the quality of life.

Clinical data
Demographic information was collected, including each patient's age, gender, BMI, smoking history, and socioeconomic status. Additionally, clinical information was gathered, including the duration of diagnosis, pulmonary function, comorbidities (including angina, myocardial infarction (MI), congestive heart failure (CHF), atrial fibrillation (AF), hypertension, diabetes, metabolic syndrome, gastroesophageal reflux disease (GERD), osteoporosis, anxiety or depression, lung cancer, asthma, arthritis, and anemia), history of tuberculosis, and prescription records. Information on resource utilization for 1 year was also collected. All these data were collected from recent medical records at the time of the survey.
COPD exacerbation is generally defined as a sustained worsening of the patient's condition beyond normal dayto-day variations, that is acute in onset and necessitates a change from the usual medication [17,18]. For the present study, we operationally defined COPD exacerbations as cases in which the patient was prescribed oral corticosteroids and antibiotics simultaneously. Additionally, hospital admission status was also investigated.

EuroQol
The EQ-5D has been used widely in a variety of clinical areas and countries to evaluate health-related quality of life. This questionnaire consists of a descriptive section defining health status (EQ-5D-3L) and a single index value that captures a self-rating of health status on a Visual Analog Scale (EQ-VAS). The descriptive section includes the five dimensions of mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is divided into the following three levels: No problem, some or moderate problems, and extreme problems. The resulting health state can be defined by a 5-digit number that combines the levels from each of the five dimensions (e.g., 11231). The EQ-VAS ratings are a quantitative measure ranging from 100 (representing the best imaginable health state) to 0 (representing the worst imaginable health state) [19]. The information derived from the EQ-5D self-classifier can also be converted into a single summary index (the EQ-5D utility index) by applying scores from a national valuation set generated from a population-based preference survey. The value set used in the present study was derived from a representative South Korean population sample with 1307 subjects using the time trade-off (TTO) method [21]. A total of 101 health states were directly valued and used to develop the model for all 243 health states defined by the EQ-5D-3L. As in many other countries, South Korean health authorities recommend the EQ-5D for cost-effective analyses.

CAT
The CAT has been developed for understanding and grading the impact of COPD on health-related quality of life [22]. It is short, simple, and easy to self-administer, and it is designed for routine use in clinical practice. It comprises eight items covering cough, phlegm, chest tightness, breathlessness, activity limitation, confidence, sleeping, and energy, and each item is scored from 0 to 5 points. The total score ranges from 0 to 40, with higher scores indicating worse quality of life. The linguistic validity and reliability of the South Korean version of the questionnaire used in this study have been verified [23,24]. The recent GOLD guidelines recommend the use of the Modified Medical Research Council (mMRC) or CAT for symptoms assessment in patients with COPD.

Data analysis Description of sample
Participants' characteristics were grouped by GOLD severity and summarized using mean and standard deviation (SD) for continuous variables, and frequencies for categorical variables. Analysis of variance (ANOVA) and chi-square tests were then performed to test the differences between groups.

Investigation of validity
We examined the validity of the EQ-5D utility index both statistically and clinically. The cross-sectional construct validity of the instruments in terms of their ability to differentiate different health states was tested using known severity group. Also, all the utility differences between groups were compared to MCID to determine if they are clinically meaningful. For construct validity, we used post-hoc analysis to determine whether there was a significant difference in quality of life scores between neighboring groups in addition to ANOVA. To investigate the sensitivity of discriminatory ability between the health status of patients with COPD from different groups, the effect size using Cohen's d (based on the standardized mean difference between two populations) was calculated [25]. Large effect sizes indicate that there is no problem in detecting a consistent difference between groups based on their means. Owing to the skewed nature of the EQ-5D utility index data, nonparametric/parametric correlation coefficients (Spearman's rank/Pearson correlation coefficient) were calculated to assess the association between the HRQoL scores and lung function. Analysis of covariance (ANCOVA) was used to compare the quality of life of each group after adjusting for factors that might affect the quality of life of patients with COPD, and the Least Significant Difference (LSD) was used for post-hoc analysis. Background characteristics such as age, gender, smoking years, insurance type, employment status, and number of comorbidities were used as control variables in these analyses. Although the assumptions of normality and constant variance were not met in the EQ-5D utility index data, the distributions of EQ-VAS and CAT scores did not deviate significantly from normal distributions. Therefore, we decided to use parametric analyses and applied ANOVA/ANCOVA. Additionally, the Kruskal-Wallis test and the subsequent Wilcoxon rank-sum test to compare two specific groups were conducted; the results of these non-parametric analyses are presented in the supplementary Table (S1). All statistical analyses were conducted using IBM SPSS 22 and STATA v.13.

MCID estimation
The MCID was estimated to determine where utilities by severity group reflect clinically meaningful differences. MCID was first used by Jaeschke et al. in 1989 to identify the smallest change that is important to patients. It can be estimated using distribution-based and anchorbased methods [26][27][28][29][30].
Distribution-based method The distribution-based method uses the variation measured in the PRO score. In this study, we used the following three assumptions.
(1) 1/2 SD approach: It was assumed that 0.5*SD would correspond to the MCID. The SD is the variation among individual scores, and approximately half an SD appeared to be the limit of discriminability of changes based on a psychological theory, which has been empirically derived in many health-related quality of life studies of chronic disease and thus does not constitute an arbitrary statistical threshold [26,31]; (2) SEM approach: After estimating measurement error using the following formula: standard error of the measurement ð SEMÞ ¼ SD Â ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1−reliability of a measure p , we assumed that SEM would correspond to the MCID. Test-retest reliability refers to the consistency of a measurement, and it can be assessed with the Intraclass Correlation Coefficient (ICC), which is frequently reported as a common metric in reliability studies [32]. The ICC was obtained from a validity study of the EQ-5D-3L involving a representative sample of the South Korean population (0.61) [33]. Where the reliability of a measure is less than 0.75, 1 SEM may be a more stringent criterion than 0.5 SD [31]; (3) Cohen's approach: Using Cohen's formulation, which is generally accepted as a benchmark, we assumed that the SD of the utility score multiplied by 0.2, corresponding to a small effect size, would correspond to the MCID [27].
Anchor-based method The anchor-based approach is a method of comparing changes in the PRO score using an anchor or external criterion. The external criterion should have a proven association with the PROs and a minimum correlation of 0.30-0.35 is recommended [28]. We considered the statistical correlation and clinical relevance of the EQ-5D utility index, and clinical indicators such as the FEV 1 % predicted, EQ-VAS, and CAT scores were selected as anchors. The FEV 1 is the most common clinical indicator of COPD prognosis in terms of repeatability. However, it is reasonable to use the MCID for the FEV 1 % predicted value instead of the FEV 1 alone because this was not a longitudinal study involving data on changes in pulmonary function values for individual patients. Although no MCID study has utilized the FEV 1 % predicted, a 5-10% difference from the baseline is considered clinically significant and a difference of less than 3% clinically insignificant [34,35]. Therefore, we used 5-10% differences in the FEV 1 % predicted as an anchor. The MCID of the EQ-VAS and CAT scores for patients with COPD were estimated to be 8 [36] and between 2 and 3 in most previous studies [37][38][39][40][41], respectively. Additionally, we constructed a simple regression model in which the external criteria used as an anchor were the independent variables and the utility was the dependent variable. The MCID of the utility index was estimated by multiplying the known MCID of the external criterion by the coefficient of the regression model. The relevance of each anchor to the EQ-5D utility index was computed as Spearman's rank correlation coefficient.
The MCID estimated with distribution-based methods has several disadvantages. It overlooks clinical significance and depends on the sample variability, so it fails to fulfill the original intention of the MCID, which to distinguish clinical from statistical significance. Moreover, as yet there is no consensus on which method provides a better estimation of MCID. In this study, we have employed several complementary distributionbased methods, and the results are presented as preliminary estimates of MCID. Finally, the range and weighted mean value of the MCID were estimated using the anchor-based method. The weight used here was the correlation coefficient between the EQ-5D utility index and each anchor.

Results
Descriptive data of the sample A total of 298 patients completed the study. Table 1 summarizes the clinical and demographic characteristics of patients according to the GOLD severity classification. Findings show that 32 patients had mild COPD; 156 patients, who formed the majority, moderate COPD; 90 patients severe COPD; and 20 patients very severe COPD. The mean age was about 69 years, and 86% of the participants were male. The mean BMI was 22.8 kg/m 2 , and the higher the severity of COPD, the lower the BMI, with significant differences among groups. Further, 26% of all patients and 75% of GOLD 4 patients had been treated with three or more therapies. The mean frequency of acute exacerbations over the past year was 0.5, and 15.8% of all patients received hospital treatment more than twice a year, or more than one admission a year, due to acute exacerbation. With reference to comorbidities, 47% of the patients had cardiovascular disease and 37% had hypertension. None of the comorbidities showed a difference in the quality of life score of the patients based on severity groups.

Construct validity
The EQ-5D utility index score, EQ-VAS, and CAT scores are presented as mean scores according to the GOLD severity group ( Table 2). All the quality of life scores decreased with an increase in disease severity, and there were a statistically significant differences between groups. The post-hoc analysis revealed that the scores on the EQ-5D utility index and CAT differed between all neighboring groups, except between GOLD 1 and GOLD 2. The effect size was computed to identify the degree of difference in the quality of life of patients by severity group. Standardized mean differences between neighboring groups were greater than 0.2, which confirmed that the severity of pulmonary function affects the quality of life of patients. In particular, even between the GOLD 1 and GOLD 2 groups, which did not reveal a statistically significant difference in quality of life scores, there were moderate differences in the EQ-5D utility index according to the effect size analysis. Among the quality of life instruments used in this study, the EQ-5D utility index best captured the differences in quality of life between the GOLD 1 and GOLD 2 groups and between the GOLD 3 and GOLD 4 groups. On the other hand, the CAT score was best at discriminating between the GOLD 2 and GOLD 3 groups. Additionally, among the quality of life instruments, the EQ-5D utility index was most highly correlated with lung function (Spearman's ρ = 0.422 and − 0.380 for the FEV 1 % predicted and the GOLD severity grade, respectively; Pearson's correlation coefficients are similar to Spearman's coefficients). Table 3 presents the mean quality of life score by GOLD group after controlling for factors that may affect the quality of life apart from COPD severity. When controlling for age, gender, number of pack-years of smoking, insurance type, employment status, and number of comorbidities, the differences in the quality of life scores between groups decreased compared to those computed without controlling for these factors. However, statistically significant differences remained. Post-hoc analysis of the differences between neighboring groups showed similar results as in the former analysis. However, CAT scores did not differ significantly between the GOLD 3 and GOLD 4 groups after adjustment.  HRQoL Health-Related Quality of Life, FEV 1 forced expiratory volume in 1 s: *p < 0.001, p-value for severity group was calculated by ANCOVA; Covariates: age, gender, smoke years, insurance type, employment status, number of comorbidities; a; b; c; the same superscript letters indicate non-significant difference between GOLD groups based on LSD post-hoc test (p < 0.05)

MCID
The preliminary estimates of MCID using the distribution-based method was 0.073, 0.091, and 0.029 for the 0.5*SD, SEM, and Cohen's approach, respectively. A simple linear regression was used to estimate the MCID by clinically relevant external criteria (anchor) ( Table 4).
The coefficient corresponding to the difference between 2 and 3, which is known as the MCID of the CAT score, was 0.021-0.031 (95% confidence interval (CI): 0.018-0.035). When the EQ-VAS score or FEV 1 % predicted value was used as an external criterion, the respective estimates were 0.033 (95% CI: 0.027-0.040) and 0.017-0.033 (95% CI: 0.012-0.042). The external criterion most relevant to the utility score was the CAT score (adjusted R 2 = 0.41). The MCID range for the participants' EQ-5D utility index scores was 0.017-0.033 (95% CI: 0.012-0.042), and the pooled estimation using the correlation coefficient was 0.028 (95% CI: 0.023-0.034). This was consistent with the MCID estimated by the Cohen's effect size distribution method. Even after adjusting for other factors affecting quality of life (Table 3), we confirmed that all the utility differences between groups exceeded the minimal clinically importance difference. In contrast, for the CAT and EQ-VAS scores, not all the differences between groups exceeded the MCID.

Discussion
In this study, the health-related quality of life of patients with COPD was measured using the EQ-5D utility index, EQ-VAS, and CAT. Among these instruments, we examined whether the EQ-5D utility index is able as a general instrument to discriminate the GOLD severity groups. We estimated the minimal clinically important difference in the EQ-5D utility index scores, which would be a meaningful for patients with COPD. We found that the EQ-5D utility, EQ-VAS, and CAT scores differed significantly by COPD severity, and this differentiation was obvious even when controlling for confounding variables. In addition to the tests of variance and covariance, this result was also consistent with the findings of the analysis of the effect size using Cohen's d. In particular, the performance of the EQ-5D utility index on assessing and differentiating quality of life was not inferior to that of the CAT, which is a COPD-specific instrument. Further, the correlation of the EQ-5D utility index with pulmonary function was higher than that of the other two measures. Preferencebased utility valuation may not consider the impact of medical conditions on individual patients, and previous studies have reported that MAU measures such as the EQ-5D utility index have limitations in detecting HRQoL changes or differences in the COPD population [4,13,14]. Especially in the EQ-5D-3L descriptive system, ceiling effects have been observed due to the limitation of three response categories per item and lack of important health-related quality of life dimensions such as vitality [42,43]. In the present study, the failure of the EQ-5D utility index to statistically differentiate between the mild and moderate COPD groups could be attributed to this ceiling effect, which would have affected the reporting of the relatively mild health status of patients. In this study, 50% (16/32) of the participants in the mild group and 31.4% (49/156) of those in the moderate group reported full health (11111). Indeed, the ceiling effect is a limitation of the EQ-5D utility index that may be somewhat improved in the 5 L system [44]. However, since a valuation study for EQ-5D-5 L in South Korea had not been completed at the time of the present survey, it was not used.
Despite the instrumental constraints of the EQ-5D-3L, our results suggest that generic MAU measures are as discriminable as condition-specific measures. The CAT was developed to replace the Saint George's Respiratory Questionnaire (SGRQ), which is a complex and timeconsuming tool for evaluating the quality of life of patients with COPD. However, the SGRQ has also been reported to show weak correlations with physiological indices such as FEV 1 [45]. While the CAT and SGRQ mainly deal with physical functioning and symptoms, MAU measures, such as the EQ-5D utility index, include social or mental functioning such as usual activity, anxiety, and depression. Therefore, they seem to reflect the effect of lung function impairment on the patient's health-related quality of life more comprehensively. However, this result should not be extended to the individual patient level. The relevance of each anchor to the EQ-5D utility index was computed as Spearman's rank correlation coefficient b the pooled estimation was calculated as weighted mean using the correlation coefficient between the EQ-5D utility index and each anchor Previous studies have reported conflicting results for differences in the quality of life scores of moderate and severe COPD groups. Some authors have noted that this is because a trial population tends to show less variation than a real-life population does owing to application of a strict protocol [46]. The present study was designed to investigate the quality of life of patients with COPD. Although the patients included in this study were not a population-based sample representative of the true population distribution, the inclusion/exclusion criteria were not strictly limited. Therefore, we could include a variety of patients. Additionally, only the patients with a stable status were selected, and we also ensured that at least 20 patients were recruited per GOLD category. Therefore, we were able to obtain meaningful results in terms of differentiation that were significant even after adjusting for other factors. However, this study could also be confounded by limitations pertaining to the hospital setting, such as too few cases of mild patients. Therefore, we could not confirm the significance of the difference in health-related quality of life between the mild and moderate COPD groups. However, we did find a moderate difference between the two groups in terms of effect size, which is relatively independent of the sample size. In South Korea, the use of tertiary hospitals, even in the regular management of chronic diseases, is common. Overall, the present study reported higher utility values than those reported in previous studies, and there might be differences in the applied valuation methods or national algorithms. A meta-analysis study acknowledged a considerable amount of variation across studies. However, there was consensus on the finding that the greater the severity, the greater the degree of deterioration of quality of life [10,11,47].
Interest in the MCID of the EQ-5D utility index has increased in recent years, but there is no widely accepted value or range. Further, the MCID of the EQ-5D utility index score of patients with COPD was included as a subgroup in the study conducted by Walters and Brazier [48], but it did not yield a significant result by itself. Therefore, the present study is meaningful in that it identified the minimally important difference in the quality of life of patients with COPD as measured by the EQ-5D utility index. This empirical evidence could be used in the process of establishing the MCID for the EQ-5D utility index. Furthermore, it is recommended that the estimation of the MCID should be based on multiple approaches or on the triangulation of methods rather than using one approach alone [28,49]. In the present study, we obtained a pooled estimate of the MCID of the EQ-5D utility index of 0.028 (range: 0.017-0.033) for patients with COPD using several relevant patient-rated and disease-specific variables as anchors. This finding was supported through the distribution-based method that employed Cohen's approach. In a study of the MAUT instrument with patients with rheumatoid arthritis, the MID estimates obtained using anchor-based methods were also consistent with those determined by the effect size method [50]. Furthermore, some studies reported an MCID of 0.03 for the EQ-5D utility index score of patients with COPD, which is similar to that observed in the present study [4,14,51]. Coretti et al. [29] found a remarkable heterogeneity in the methods of estimation of the MCID for the EQ-5D utility index and acknowledged that they may vary according to population and context. Thus, further discussion and research is needed to reach a consensus on the MCID for the EQ-5D utility index.

Conclusions
The present study examined the quality of life of patients with COPD using data obtained from a crosssectional patient survey. Additionally, it examined the validity of the EQ-5D utility index both statistically and clinically. In conclusion, the EQ-5D utility index, a general instrument, exhibited a good ability to distinguish between patients based on COPD severity with a performance similar to that of the CAT, a COPD-specific instrument. The MCID estimates for the EQ-5D utility index using both distribution and anchor-based methods were similar. The pooled MCID estimate for the EQ-5D utility index was 0.028. Given that the differences across all groups exceeded the MCID, the EQ-5D utility index also seems to be capable of capturing significant clinical differences. Finally, the utility score measured by the EQ-5D seems appropriate for use in the economic evaluation of patients with COPD.
Additional file 1: S1. HRQoL median score according to the GOLD severity group.