- Research article
- Open Access
Can healthcare utilization data reliably capture cases of chronic respiratory diseases? a cross-sectional investigation in Italy
BMC Pulmonary Medicine volume 17, Article number: 20 (2017)
Healthcare utilization data are increasingly used for chronic disease surveillance. Nevertheless, no standard criteria for estimating prevalence of high-impact diseases, such as chronic obstructive pulmonary disease (COPD) and asthma, are available. In this study an algorithm for recognizing COPD/asthma cases from HCU data is developed and implemented in the HCU databases of the Italian Lombardy Region (about 10 million residents). The impact of diagnostic misclassification for reliably estimating prevalence was also assessed.
Disease-specificdrug codes, hospital discharges together with co-payment exemptions when available, and a combination of them according with patient’s age, were used to create the proposed algorithm. Identified cases were considered for prevalence estimation. An external validation study was also performed in order to evaluate systematic uncertainty of prevalence estimates.
Raw prevalence of COPD and asthma in 2010 was 3.6 and 3.3% respectively. According to external validation, sensitivity values were 53% for COPD and 39% for asthma. Adjusted prevalence estimates were respectively 6.8 and 8.5% for COPD (among person aged 40 years or older) and asthma (among person aged 40 years or younger).
COPD and asthma prevalence may be estimated from HCU data, albeit with high systematic uncertainty. Validation is recommended in this setting.
Chronic obstructive pulmonary disease (COPD) and asthma are two of the most common chronic respiratory diseases (CRDs). CRDs have a large impact on public health due to both their high prevalence and related morbidity and mortality and their substantial socio-economic costs [1–3].
Assessing the burden of CRDs through valid and updated estimates of their prevalence, may help healthcare decision makers in guiding public health policy, even though such estimates are not easy to obtain.
Since patients affected by CRDs plausibly make use of healthcare services during the course of their disease, healthcare utilization (HCU) databases are frequently considered as useful data sources to capture CRD cases and estimate their prevalence in large unselected populations [4–7]. To this aim, investigators typically use ad-hoc algorithms based on the use of healthcare services, such as drug dispensations or hospital admissions. Clearly, such algorithms may be characterized by different operating characteristics, but validated standards are currently unavailable in this setting.
To assess the reliability of the algorithms in capturing patients affected by COPD and asthma, we conducted a cross-sectional study based on the HCU databases of the Italian Lombardy Region. In this study, we i) employed several algorithms to detect COPD and asthma cases from HCU data, ii) assessed the agreement between them, iii) evaluated the impact of misclassification on the prevalence estimates of COPD and asthma and iv) compared our prevalence with those available from the scientific literature.
The data used for this study were retrieved from the HCU databases of Lombardy, a Region of Italy accounting for about 16% (almost 10,000,000) of the national population. In Italy, the entire population benefits from healthcare assistance provided by the National Health Service (NHS), which in Lombardy has been associated since 1997 with an automated system of databases. Among others, these include: 1) an archive of NHS beneficiaries (practically the whole resident population), reporting demographic and administrative data; 2) a hospital discharge database, reporting all discharge diagnoses released from public or private hospitals; 3) an outpatient drug prescriptions database, reporting all dispensations of drugs reimbursable by the NHS; and 4) an archive of co-payment exemptions reporting information on all beneficiaries of co-payment exemptions granted for selected chronic diseases. For each patient, we linked these databases via a single anonymous identification code in full preservation of individuals’ privacy .
Algorithms for case detection and prevalence estimation
The target population consisted in all beneficiaries of NHS assistance, residing in Lombardy Region in 2010.
Three algorithms were considered for detecting patients suffering of COPD and asthma from HCU databases. The first one, denoted as reference algorithm, was based on expert opinions of the scientific board of CRACK-CRD program, composed by general practitioner, lung specialists and epidemiologists. This algorithm was obtained combining age (<40 year for asthma and ≥40 years for BPCO) and use of healthcare services considering the hospital discharges, drug dispensations and co-payment exemptions (asthma only) recorded in the databases during 2010 for capturing the target diseases (Table 1).
Starting from the criteria proposed by Anecchino et al.  for COPD, and Bianchi et al.  for asthma, two comparison algorithms were also implemented taking into account for case definition age (<40 year for asthma and ≥40 years for BPCO) and drug dispensations only. The drugs considered in these algorithms are those included in the reference algorithm. In particular, patients were identified as cases if they received at least one (permissive algorithm), or ≥2 dispensations (restrictive algorithm) of the considered medicaments during 2010 (Table 2).
The specific codes used to identify the asthma and COPD cases in terms of drug prescriptions (ATC codes) [9, 10], diagnosis at discharge (ICD-9 CM codes)  for asthma and COPD and exemptions for asthma  for the three algorithms are listed in the Additional file 1: Table S1.
Cases detected from each of these algorithms following the criteria reported in Tables 1 and 2 were considered to estimate the raw prevalence of asthma and COPD considering all beneficiaries of NHS assistance during 2010 as reference population.
Assessing between algorithms agreement
The between algorithms agreement in detecting patients suffering of COPD and asthma was measured by means of Cohen’s Kappa (K) index . Following Landis & Koch , values K ≥0.80 were considered as representing optimal agreement.
External validation and accounting for misclassification
The algorithms’ validation was performed involving data retrieved from a network of about 50 general practitioners (GPs) from Lombardy Region participating in the network on voluntary basis. In particular, the GPs identified among their patients those with a diagnosis of COPD or asthma based on standard practice criteria including the evaluation of the manifestation of the disease, patients characteristics such history of chronic or recurrent cough, sputum, wheezing or shortness of breath or based on the diagnosis of COPD or asthma made by a lung specialists or other specialized doctor. The information of these patients were then, reported to us. Assuming that the GP’s diagnosis were errors free, the proportion of individuals detected by a given algorithm as suffering from COPD or asthma among those reported by GPs defines the sensitivity (SE) of that algorithm.
The method proposed by Rogan & Gladen  was used to account for diagnostic misclassification. In particular, assuming 100% specificity, the adjusted prevalence was calculated by the ratio between raw prevalence and SE.
Comparison with the literature
We carried out a MEDLINE / GOOGLE SCHOLAR search of studies published from 2005 to 2013 reporting prevalence of COPD or asthma in Italy. The studies were classified according to the data source. In particular, we included studies based on HCU databases, network of GPs and population-based survey. The prevalence reported by each individual study was compared to that obtained applying our reference algorithm to Lombardy’s HCU databases considering the same calendar years and age range of the considered study. Both raw and adjusted prevalence derived from Lombardy HCU databases were reported.
Statistical analyses were performed using SAS (v9.3; SAS Institute, Cary, North Carolina, USA).
In 2010 there were 10,172,161 NHS beneficiaries in Lombardy (43% with age <40 years).
The number COPD and asthma cases detected in this population varied substantially depending on the considered algorithm (Table 3).
The majority of patients with asthma were detected by the reference algorithm through prescriptions (74%) and exemptions (16%). Only the 1% of cases was identified by hospitalization only. Even for COPD the largest number of cases was identified through prescriptions (84%), the 7% by hospitalization only and the 9% using both sources.
Apparently, there were not substantial differences in terms of prevalence between the reference and restrictive algorithms, in fact the prevalence estimates were 3.3% for asthma considering both algorithms and 3.6 or 3.8% for COPD respectively for the reference and restrictive algorithms while the permissive algorithm reports much higher estimates. Moderate and fair agreement according to Landis and Koch scale was observed between reference and the permissive comparison algorithm respectively for COPD (Kappa index = 0.46) and asthma (Kappa index = 0.35) suggesting that the two algorithms often detected different patients. A substantial agreement was instead observed for the restrictive algorithm for both the respiratory diseases investigated. Moreover, both reference and restrictive algorithms detected just over half of the patients suffering of COPD (being the corresponding sensitivity estimates 53 and 51%) and almost a third of those suffering of asthma (being the corresponding sensitivity estimates 39 and 31%). As expected, a higher number of cases was detected from the permissive algorithm, but the strong disagreement with the reference one suggests that most of them were false positives or, conversely, that the reference algorithm is unable to detect all potential asthma and COPD cases due to its low sensitivity.
In Table 4 are reported the age and gender specific asthma and COPD distribution and prevalence according to the different algorithms implemented.
Prevalence of COPD seems to increase with age in both males and females according to all algorithms. Regarding gender differences, considering the reference algorithm, the prevalence seems higher in females than in males while no strong differences were observed in the gender-specific estimates obtained with the other algorithms. Regarding asthma, the estimates are higher in men than in women and in particular in the age class 0–19 years.
Studies based on HCU data generally reported COPD and asthma prevalence very close to that obtained from our reference algorithm (raw prevalence). Similar COPD prevalence was also obtained from the unique study based on GPs data and our reference algorithm. In all the other cases (i.e., surveys reporting COPD or asthma prevalence and GPs-based studies reporting asthma prevalence) much lower prevalence was obtained from our reference algorithm respect to the original reporting. As expected, closest estimates were obtained accounting for diagnostic misclassification, although original prevalence based on surveys almost always showed higher values. Finally, it should be emphasized that original estimates were more heterogeneous than those based on our algorithm. In fact, estimates ranged from 2.8 to 7.2% and from 3.6 to 7.2% according to original and algorithm-based COPD prevalence, and from 3.5 to 10.7% and from 3.3 to 7.9% according to original and algorithm-based asthma prevalence.
An algorithm for detecting patients suffering from COPD and asthma from HCU databases was applied to the population of the Italian Region of Lombardy in the year 2010. We found a prevalence of 3.6 and 3.3% for COPD and asthma respectively. Our algorithm was employed to favour the specificity of detection. In other words, since it is unlikely that an individual who does not suffer from COPD (or asthma) is hospitalized with a diagnosis of COPD (or asthma), and/or uses a medication to treat COPD (or asthma) and/or benefits of exemption for asthma, the rate of false positives detected by our algorithm is expected to be close to zero. It is not surprising that the prevalence estimates obtained by other algorithms, mainly based on drug dispensation [9, 10], widely disagree with ours, likely due to 1) false positive reports (e.g. of patients suffering of bronchitis and bronchiectasis) and/or 2) too broad drug categories (e.g. any respiratory medicament) [6, 16, 17].
However, we realized that, despite the high expected specificity, our algorithm is affected by no optimal sensitivity, being the latter a very serious weakness for investigations aimed to measure the burden of disease. In fact, we found that just over half and one third of patients suffering from COPD and asthma were respectively detected from our algorithm, making prevalence seriously biased towards underestimation. However, measuring sensitivity of our algorithm through an external validation, adjusted prevalence of 6.8% (COPD among person aged 40 years or older) and 8.5% (asthma among person aged 40 years or younger) were obtained. These figures should be compared with the 8.8% prevalence of COPD in adults aged ≥40 years and 7.4% prevalence of asthma in persons aged <44 years, reported for the 28 countries of the European Union around 2010 . The risk of misdiagnosis of COPD and asthma in general practice is generally considered to be of some concern  and for this reason GPs’ reports must to be considered an imperfect gold-standard for validation of our algorithms. It follows that the prevalence of COPD and asthma obtained in our study should be considered biased, even if they were corrected for potential misclassification of the diagnosis. It should be emphasized, however, that our prevalence estimates were similar to those obtained assuming a sensitivity of GPs diagnosis of 0.77 and 0.81 for COPD and asthma respectively (i.e., those found in a recent Multicentric Italian study ) and specificity close to 1 (i.e., with values of 0.98 and 1.00 for COPD and asthma respectively).
The raw prevalence estimates showed in the present study seems to be lower compared to those reported in other studies regarding the Italian population available in the scientific literature. Different reasons may explain these observed differences; first of all, a lack of homogeneity in the criteria used to identify COPD and asthma cases is observed. In fact, every study uses a different algorithm for case definition characterized by its own predictive value affecting the number of cases detected and consequently the prevalence estimates. Secondly, the sources of data (HCUs, surveys, GPs) used to estimate prevalence in the scientific literature are characterized by different level of completeness and information that may influence the prevalence estimates obtained. In particular, the surveys can include patients who may have mild symptoms that do not lead them to seek medical care and often investigate unspecific symptoms rather than carefully diagnosed diseases.
It must be mentioned that HCUs are a useful data source to investigate the prevalence of diseases because they describe in a very accurate way the real world practice but on the other hand, they cannot capture patients affected by COPD or asthma who do not require specific care, not access to healthcare  or not consult the GPs .
In conclusion, our study confirms and adds further evidence that COPD and asthma should be considered important public health issues also in Italy since almost 9% of children and young adults and 7% of older adults is actually affected by asthma or COPD respectively. As a novel and original message, our study showed that HCU databases are useful sources for estimating prevalence of COPD and asthma, provided that validated algorithms combining the use of several healthcare service are applied for detecting ill patients. This is of great importance because, given the wide availability of high quality HCU data, monitoring and comparing burden of chronic respiratory diseases, as well as evaluating the impact of public health services, is easily accomplished with limited efforts.
Chronic obstructive pulmonary disease
Chronic respiratory diseases
Healthcare utilization databases
Long-acting beta agonists
National health service
Prescription per year
Short-acting beta agonists
Global initiative for chronic obstructive lung disease: Global strategy for the diagnosis, management and prevention of chronic obstructive pulmonary disease. http://www.goldcopd.org. Accessed Aug 2014.
Bahadori K, Doyle-Waters MM, Marra C, Lynd L, Alasaly K, Swiston J, FitzGerald JM. Economic burden of asthma: a systematic review. BMC Pulm Med. 2009;9:24.
Pauwels RA, Rabe KF. Burden and clinical features of chronic obstructive pulmonary disease (COPD). Lancet. 2004;364(9434):613–20.
Chini F, Pezzotti P, Orzella L, Borgia P, Guasticchi G. Can we use the pharmacy data to estimate the prevalence of chronic conditions? a comparison of multiple data sources. BMC Public Health. 2011;11:688.
Gershon AS, Wang C, Guan J, Vasilevska-Ristovska J, Cicutto L, To T. Identifying individuals with physcian diagnosed COPD in health administrative databases. COPD. 2009;6(5):388–94.
Gini R, Francesconi P, Mazzaglia G, Cricelli I, Pasqua A, Gallina P, Brugaletta S, Donato D, Donatini A, Marini A, Zocchetti C, Cricelli C, Damiani G, Bellentani M, Sturkenboom MC, Schuemie MJ. Chronic disease prevalence from Italian administrative databases in the VALORE project: a validation through comparison of population estimates with general practice databases and national survey. BMC Public Health. 2013;13:15.
Wiréhn AB, Karlsson HM, Carstensen JM. Estimating disease prevalence using a population-based administrative healthcare database. Scand J Public Health. 2007;35(4):424–31.
Corrao G, Cesana G, Merlino L. Pharmacoepidemiological research and the linking of electronic healthcare databases available in the Italian region of Lombardy. BioMed Stat Clin Epidemiol. 2008;2:117–25.
Anecchino C, Rossi E, Fanizza C, De Rosa M, Tognoni G. Romero M; working group ARNO project. Prevalence of chronic obstructive pulmonary disease and pattern of comorbidities in a general population. Int J Chron Obstruct Pulmon Dis. 2007;2(4):567–74.
Bianchi M, Clavenna A, Sequi M, Bortolotti A, Fortino I, Merlino L, Bonati M. Anti-asthma medication prescribing to children in the Lombardy region of Italy: chronic versus new users. BMC Pulm Med. 2011;11:48.
Mapel DW, Frost FJ, Hurley JS, Petersen H, Roberts M, Marton JP, Shah H. An algorithm for the identification of undiagnosed COPD cases using administrative claims data. J Manag Care Pharm. 2006;12:457–65.
Italian Health Ministry. http://www.salute.gov.it/portale/temi/ricercaEsenzioniTicket.jsp. Accessed Jan 2016.
Fleiss JL, Levin B, Cho Paik M. Statistical methods for rates and proportions. 3rd ed. Hoboken, New Jersey: John Wiley & Sons; 2003.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.
Rogan WJ, Gladen B. Estimating prevalence from the results of a screening test. Am J Epidemiol. 1978;107(1):71–6.
Faustini A, Canova C, Cascini S, Baldo V, Bonora K, De Girolamo G, Romor P, Zanier L, Simonato L. The reliability of hospital and pharmaceutical data to assess prevalent cases of chronic obstructive pulmonary disease. COPD. 2012;9(2):184–96.
Faustini A, Cascini S, Arcà M, Balzi D, Barchielli A, Canova C, Galassi C, Migliore E, Minerba S, Protti MA, Romanelli A, Tessari R, Vigotti MA, Simonato L. Chronic obstructive pulmonary disease prevalence estimated using a standard algorithm based on electronic health data in various areas of Italy. Epidemiol Prev. 2008;32(3 Suppl):46–55.
Cazzola M, Puxeddu E, Bettoncelli G, Novelli L, Segreti A, Cricelli C, Calzetta L. The prevalence of asthma and COPD in Italy: a practice-based study. Respir Med. 2011;105(3):386–91.
de Marco R, Cappa V, Accordini S, Rava M, Antonicelli L, Bortolami O, Braggion M, Bugiani M, Casali L, Cazzoletti L, Cerveri I, Fois AG, Girardi P, Locatelli F, Marcon A, Marinoni A, Panico MG, Pirina P, Villani S, Zanolin ME, Verlato G, GEIRD Study Group. Trends in the prevalence of asthma and allergic rhinitis in Italy between 1991 and 2010. Eur Respir J. 2012;39(4):883–92.
de Marco R, Pesce G, Marcon A, Accordini S, Antonicelli L, Bugiani M, Casali L, Ferrari M, Nicolini G, Panico MG, Pirina P, Zanolin ME, Cerveri I, Verlato G. The coexistence of asthma and chronic obstructive pulmonary disease (COPD): prevalence and risk factors in young, middle-aged and elderly people from the general population. PLoS One. 2013;8(5):e62985.
Demoly P, Paggiaro P, Plaza V, Bolge SC, Kannan H, Sohier B, Adamek L. Prevalence of asthma control among adults in France, Germany, Italy, Spain and the UK. Eur Respir Rev. 2009;18(112):105–12.
Jarvis D, Newson R, Lotvall J, Hastan D, Tomassen P, Keil T, Gjomarkaj M, Forsberg B, Gunnbjornsdottir M, Minov J, Brozek G, Dahlen SE, Toskala E, Kowalski ML, Olze H, Howarth P, Krämer U, Baelum J, Loureiro C, Kasper L, Bousquet PJ, Bousquet J, Bachert C, Fokkens W, Burney P. Asthma in adults and its association with chronic rhinosinusitis: the GA2LEN survey in Europe. Allergy. 2012;67(1):91–8.
ISTAT Italian Institute of Statistics. National Health Survey. http://www.istat.it/it/archivio/10836 (2008). Accessed January 2016.
To T, Stanojevic S, Moores G, Gershon AS, Bateman ED, Cruz AA, Boulet LP. Global asthma prevalence in adults: findings from the cross-sectional world health survey. BMC Public Health. 2012;12:204.
Gibson GJ, Loddenkemper R, Lundbäck B, Sibille Y. Respiratory health and disease in Europe: the new European lung white book. Eur Respir J. 2013;42(3):559–63.
Lusuardi M, De Benedetto F, Paggiaro P, Sanguinetti CM, Brazzola G, Ferri P, Donner CF. A randomized controlled trial on office spirometry in asthma and COPD in standard general practice: data from spirometry in asthma and COPD: a comparative evaluation Italian study. Chest. 2006;129(4):844–52.
Belleudi V, Agabiti N, Kirchmayer U, Cascini S, Bauleo L, Berardini L, Pinnarelli L, Stafoggia M, Fusco D, ArcÃ M, Davoli M, Perucci CA. Definition and validation of a predictive model to identify patients with chronic obstructive pulmonary disease (COPD) from administrative databases. Epidemiol Prev. 2012;36(3–4):162–71.
Romanelli AM, Raciti M, Protti MA, Prediletto R, Fornai E, Faustini A. How reliable Are current data for assessing the actual prevalence of chronic obstructive pulmonary disease? PLoS One. 2016;11(2):e0149302.
This work was supported by funding from GlaxoSmithKline.
Members of the “CRD Real-1 World Evidence” scientific board:
Ovidio Brignoli. Italian College of General Practitioners, Florence, Italy. Email: firstname.lastname@example.org
Isa Cerveri. IRCCS “San Matteo” Hospital Foundation Pavia, Italy and University of Pavia, Pavia, Italy. Email: email@example.com
Giovanni Corrao. Unit of Biostatistics, Epidemiology and Public Health, Laboratory of Healthcare Research & Pharmacoepidemiology, Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy. Email: firstname.lastname@example.org
Eugenio Guffanti. Unit of Pulmonary Rehabilitation, Research Hospital of Casatenovo, Italian National Research Centre on Aging (INRCA), Casatenovo, Italy. Email: email@example.com
Adriano Vaghi. Division of Pneumology, “Guido Salvini” Hospital, Garbagnate Milanese, Italy. Email: AVaghi@aogarbagnate.lombardia.it
Marco Villa, Local Health Authority ASL Cremona, Via San Sebastiano, Cremona, Italy. Email: firstname.lastname@example.org
Roberto de Marco. Unit of Epidemiology and Medical Statistics, University of Verona, Verona, Italy. Prof Roberto de Marco died during the drafting of this paper. The authors want to dedicate this paper to his memory
This work was supported by funding from GlaxoSmithKline. GC had a collaborative relationship with the advisory boards of Roche.
Availability of data and materials
The data used in this study are property of Lombardy region and stored by Lombardia Informatica S.p.A (Healthcare utilization databases) and My search (GP data). It is only possible to have access to the data but they can not be shared. The data access procedure imply the submission of a study protocol to the data owner and the protocol evaluation from a qualified committee. If the research question is of interest for the data owner and the study is well designed, the permission for data access is provided.
GC generated the study idea and wrote the final manuscript; AB and RC performed all statistical analyses; AA, AG and LS contributed to the final manuscript writing and supervised all work; the other authors contributed by helping in interpreting the obtained results and in revising the first manuscript drafts. All authors read and approved the final manuscript.
Prof Roberto de Marco, already head of the Unit of Epidemiology and Medical Statistics, University of Verona, Verona, Italy, died during the drafting of this paper. The authors want to dedicate this paper to his memory.
Conflicts of interests
GC had a collaborative relationship with the advisory boards of Novartis and Roche. For all other authors, there have been no involvements that might raise the question of bias in the work reported or in the conclusions, implications, or opinions stated.
Consent for publication
No individual data are available in this paper.
Ethics approval and consent to participate
According to the rules from the Italian Medicines Agency (available at: http://www.agenziafarmaco.gov.it/sites/default/files/det_20marzo2008.pdf) retrospective studies without direct contact with patients do not need a written consent to process personal data when they are used for research aims.
Table S1. ICD-9 CM, ATC and exemption codes used in the reference algorithm and comparison algorithms applied to capture COPD and asthma cases among the beneficiaries of the Regional Health Service. Lombardy, Italy. (DOCX 15 kb)
About this article
Cite this article
Biffi, A., Comoretto, R., Arfè, A. et al. Can healthcare utilization data reliably capture cases of chronic respiratory diseases? a cross-sectional investigation in Italy. BMC Pulm Med 17, 20 (2017). https://doi.org/10.1186/s12890-016-0362-6
- Chronic obstructive pulmonary disease
- Healthcare utilization database