This study used a research database of health care claims, the 2001-2006 Medstat MarketScan Commercial Claims and Encounters and Medicare Supplemental Databases (Ann Arbor, MI). This national database contains over 500 million claim records per year from individuals with private health care insurance in the United States. Scientific studies based on this data source have been reported in more than 75 peer-reviewed articles . The data come from approximately 45 large employers who self-insure their employees and dependents. Typically, large corporations or government entities use self-insurance as a way to manage their risk pool of sick and healthy employees rather than hire an insurance company. Self-insured employers have detailed data on the health care utilization of their employees and dependents since medical care providers must submit claims for reimbursement for all services rendered.
The MarketScan database offers advantages over raw administrative claims because data files undergo validity and editing procedures to ensure high quality and consistency in fields across years . The data are evaluated against population norms, previous year summaries, and validated data subsets. Outliers are flagged and reviewed for coding or processing errors. Encounter data describe the individuals who are covered within the database and are audited at the health plan level and plans submitting incomplete data are excluded. Diagnostic and procedural codes are compared against validity algorithms and set to missing values if inconsistent. The encounter data include age, sex, geographic residence, and eligibility information. The prescription claims include the national drug codes, date of purchase, quantity, days' supply, and expenditure information. The medical claims contain payment information, diagnoses, procedure codes, and type of provider. For this analysis, we pooled annual files to create a dataset of approximately 12 million people.
We identified the study sample as individuals who had at least 2 inpatient or outpatient claims with a diagnosis of CF using an International Classification of Disease-9th Revision-Clinical Modification (ICD-9-CM) code of 277.0x and were continuously enrolled in health insurance with drug coverage for at least 1 year (n = 2,515). Using at least 2 claims for the diagnosis of CF has been shown to increase specificity for CF cases by increasing the probability of identifying true cases in administrative datasets, although sensitivity is generally low . Individuals were excluded if they had not used TIS as described below (n = 1,691) or had undergone lung transplantation (n = 20).
Measure of TIS utilization
We developed a new measure of adherence with TIS to reflect the unique administration of this therapy as a cycle of 28 days on therapy and 28 days off therapy. Adherence was calculated as the sum of the days' supply dispensed during the year divided by 56. Overall adherence categories were defined as: low utilization ≤ 2 cycles, medium utilization > 2 to < 4 cycles, and high utilization ≥ 4 cycles. These categories were defined with clinician input and review of TIS utilization data in the hypothesis generating phase of this study. Adherence, as defined by these annual utilization thresholds, was measured only during the first year of observation, and did not include extemporaneously compounded aerosolized tobramycin.
Measures of health care utilization
Health care utilization was extracted from all-cause medical claim encounters. Hospitalization was determined by any admission to an inpatient care setting. Total costs were summed over the year and categorized into two main settings of care (i.e., inpatient care and outpatient care), and prescription drugs. We inflated the health care cost values to the year 2006 using factors from the US Bureau of Labor Statistics current data on the consumer price index .
Demographic and Health Measures
We evaluated demographic characteristics, including age, gender, and the 4 U.S. geographic regions of residence (i.e., Northeast, Midwest, South, and West), and type of health insurance plan (i.e., comprehensive, health maintenance organization, preferred provider organization, and other). In addition, we examined the claims data of each individual for primary and secondary diagnoses indicating selected comorbidities. A total burden of comorbidity score was calculated using the Diagnostic Cost Group Hierarchical Condition Category (DCG/HCC) classification system (DxCG, Boston, MA) . The DCG/HCC model produces a risk score for each person based on the presence of 189 medical conditions in the diagnosis fields of claims records. The model uses diagnoses from all sites of service and imposes hierarchies on the resulting condition categories prior to calculating risk scores. The hierarchies identify the most costly manifestation of each distinct disease and decrease sensitivity of the DCG/HCC models to coding idiosyncrasies. For ease of interpretation, we normalized the individual's scores by the population average score, and categorized the study population into two groups, those with less than or average (for the general population) comorbidity risk and those with higher than average risk. In the general population, the DCG/HCC has shown good predictive validity for death, hospitalization, and health care expenditures . In addition, we searched for two CF-related diagnoses, not included in the comorbidity score, P. aeruginosa (ICD-9 codes 277.02, 041.7) and Failure to Thrive/Growth Failure (ICD-9 codes 783.41, 783.43).
For the statistical analyses, we calculated means, frequency distributions and 95% confidence intervals, and tested group differences using chi-squares and t-tests at the p < 0.05 level. We reported medians and interquartile ranges for all expenditures and used logistic regressions to estimate the probability of hospitalization.
All statistical analyses were carried out with the SAS 9.1.3 (SAS Institute, Inc., Cary, North Carolina). Additionally, this study received an exemption waiver from the University of Massachusetts's Institutional Review Board for the use of previously collected and de-identified data.