Systematic review with meta-analysis of the epidemiological evidence relating smoking to COPD, chronic bronchitis and emphysema

Background Smoking is a known cause of the outcomes COPD, chronic bronchitis (CB) and emphysema, but no previous systematic review exists. We summarize evidence for various smoking indices. Methods Based on MEDLINE searches and other sources we obtained papers published to 2006 describing epidemiological studies relating incidence or prevalence of these outcomes to smoking. Studies in children or adolescents, or in populations at high respiratory disease risk or with co-existing diseases were excluded. Study-specific data were extracted on design, exposures and outcomes considered, and confounder adjustment. For each outcome RRs/ORs and 95% CIs were extracted for ever, current and ex smoking and various dose response indices, and meta-analyses and meta-regressions conducted to determine how relationships were modified by various study and RR characteristics. Results Of 218 studies identified, 133 provide data for COPD, 101 for CB and 28 for emphysema. RR estimates are markedly heterogeneous. Based on random-effects meta-analyses of most-adjusted RR/ORs, estimates are elevated for ever smoking (COPD 2.89, CI 2.63-3.17, n = 129 RRs; CB 2.69, 2.50-2.90, n = 114; emphysema 4.51, 3.38-6.02, n = 28), current smoking (COPD 3.51, 3.08-3.99; CB 3.41, 3.13-3.72; emphysema 4.87, 2.83-8.41) and ex smoking (COPD 2.35, 2.11-2.63; CB 1.63, 1.50-1.78; emphysema 3.52, 2.51-4.94). For COPD, RRs are higher for males, for studies conducted in North America, for cigarette smoking rather than any product smoking, and where the unexposed base is never smoking any product, and are markedly lower when asthma is included in the COPD definition. Variations by sex, continent, smoking product and unexposed group are in the same direction for CB, but less clearly demonstrated. For all outcomes RRs are higher when based on mortality, and for COPD are markedly lower when based on lung function. For all outcomes, risk increases with amount smoked and pack-years. Limited data show risk decreases with increasing starting age for COPD and CB and with increasing quitting duration for COPD. No clear relationship is seen with duration of smoking. Conclusions The results confirm and quantify the causal relationships with smoking.

Results: Of 218 studies identified, 133 provide data for COPD, 101 for CB and 28 for emphysema. RR estimates are markedly heterogeneous. Based on random-effects meta-analyses of most-adjusted RR/ORs, estimates are elevated for ever smoking (COPD 2. 89 51-4.94). For COPD, RRs are higher for males, for studies conducted in North America, for cigarette smoking rather than any product smoking, and where the unexposed base is never smoking any product, and are markedly lower when asthma is included in the COPD definition. Variations by sex, continent, smoking product and unexposed group are in the same direction for CB, but less clearly demonstrated. For all outcomes RRs are higher when based on mortality, and for COPD are markedly lower when based on lung function. For all outcomes, risk increases with amount smoked and pack-years. Limited data show risk decreases with increasing starting age for COPD and CB and with increasing quitting duration for COPD. No clear relationship is seen with duration of smoking.

Conclusions:
The results confirm and quantify the causal relationships with smoking.

Background
It has been known for many years that smoking causes chronic obstructive pulmonary disease (COPD). In 1984, the US Surgeon General [1] concluded that, in the United States, 80 to 90% of morbidity from COPD is attributable to cigarette smoking. However, we know of no previous systematic review quantifying this relationship by metaanalysis, and we attempt to rectify this omission. It is recognized [1] that COPD comprises three separate, often interconnected disease processes: (1) airway thickening and narrowing with expiratory airflow obstruction; (2) chronic mucus hypersecretion, resulting in chronic cough and phlegm production; and (3) emphysema, an abnormal dilation of distal airspaces combined with destruction of alveolar walls. The present review considers all three processes by summarizing the epidemiological evidence relating smoking separately to the incidence or prevalence of COPD, chronic bronchitis (CB) and emphysema. Elsewhere [2], we systematically review evidence on the relationship between smoking and decline in forced expiratory volume in one second (FEV 1 ).
Because COPD is rarely seen in children or adolescents, we restrict attention to adults. We also limit attention to studies of the general population, so do not, for example, consider studies in subjects suffering from alpha-1 antitrypsin deficiency or exposed to particular respiratory hazards. To provide a broad description of the relationship, we do not concentrate on one primary analysis, but quantify the relationship of each of the three outcomes studied (COPD, CB, emphysema) to each of a range of indices of smoking, investigating how these relationships vary according to characteristics such as sex, age, location, study design, period considered, definition of outcome, definition of exposure and extent of confounder adjustment.

Methods
Full details of the methods used are described in Additional file 1, and are summarized below.

Inclusion and exclusion criteria
Attention was restricted to epidemiological studies published before 2007 on COPD, CB or emphysema, providing relative risk (RR) estimates for one or more defined "major indices" (ever, current or ex smoking compared with never smoking) or "dose-related indices" (amount smoked, age of starting to smoke, pack-years smoked, duration of smoking or duration of quitting). Throughout this paper, we use the term RR to include its various estimators, including the odds ratio and the hazard ratio.
Studies were excluded if in children or adolescents, or in subjects at especially high risk of respiratory disease (e.g. workers in risky occupations), selected as having co-existing diseases or conditions, or from atypical populations likely to have a highly unusual prevalence of smoking or disease. Also excluded were uncontrolled case studies, and studies of disease exacerbation or undiagnosed disease, of symptom-free subjects, or where the only results were adjusted for symptoms or precursors of disease.

Definition of the outcomes COPD
The term COPD is quite recent, so studies with outcomes described otherwise were also included. These could be based on International Classification of Diseases (ICD) codes, on lung function criteria, on a combination of lung function criteria and symptoms, or on combinations of diagnosed conditions (such as CB or emphysema, or CB, emphysema or asthma), where diagnoses were extracted from medical records or reported in questionnaires. Unacceptable outcomes included CB or emphysema separately, acute or unspecified bronchitis, non-specific respiratory disease, or outcomes based only on symptoms and not on lung function. The range of ICD codes had to cover both CB and emphysema, and could also cover asthma, acute and unqualified bronchitis, bronchiectasis and some other defined lung conditions. Broader-ranging definitions (e.g. respiratory disease) were not accepted. Acceptable lung function criteria included those of the Global Initiative for Chronic Obstructive Lung Disease (GOLD) [3,4], the British Thoracic Society (BTS) [5], the European Respiratory Society (ERS) [6] and the American Thoracic Society (ATS) [7][8][9]. Use of a bronchodilator was not a requirement.

CB
Where based on the ICD, the range had to include the code(s) for CB and could also include codes for acute or unspecified bronchitis. Acceptable outcomes could also be based on medical records, in-study diagnosis, selfreport of physician diagnosis or of history of the disease, or on symptoms. The British Medical Research Council (MRC) criterion of daily productive cough for at least three consecutive months for more than two successive years [10,11] was recognized as a set of symptoms defining CB. Diagnoses or symptoms called "bronchitis" were accepted where the context clearly indicated it was chronic. Diagnoses based on symptoms not referred to as CB were also accepted, provided the definition included both chronic cough and phlegm.

Emphysema
The outcome could be based on the ICD code for emphysema, on medical records, in-study diagnosis, or on self-report of physician diagnosis or history of the disease.

Choice of outcome
Where a study provided data for multiple acceptable definitions of an outcome, results were entered only for one. Additional file 1 gives the rules specifying choice of outcome, and, for studies providing a choice, lists definitions selected and rejected. It also gives, for all studies, the description of the disease and the source of the diagnosis for all outcomes where data were entered.

Literature searching
Searching was carried out in phases. Initially, 1407 potentially relevant papers, published up to 2002, were derived by AJT from an unpublished project which used the MeSH terms chronic bronchitis and symptoms, emphysema, lung function, genetic determinants, mortality, adults and smoking. Subsequently, additional Medline searches were conducted in 2006 by AJT and in 2008 by BAF, using the MeSH term "Pulmonary disease, chronic obstructive". Papers were also sought from in-house files on smoking and health, and references cited in papers obtained. Publications before 2007 were considered, with no restriction on language or on peerreviewed journals. Reasons for rejection were recorded.

Identification of studies
Relevant papers were allocated to studies, noting multiple papers on the same study, and papers reporting on multiple studies. Each study was given a unique reference code (REF) of up to 6 characters (e.g. DICKIN or CHEN3), based on the principal author's name, and distinguishing multiple studies by the same author. Occasionally, an original study was split into separate studies (e.g. where follow-up periods differed by sex).
Some studies were noted as having overlaps or links with other studies. To minimize problems in meta-analysis arising from double-counting of cases, these links were divided into three types, as shown in Additional file 2. The first involved no such double-counting, while the second included studies with minor overlap, which could not be disentangled, and which it was decided to ignore. The third type contains sets of studies which probably or definitely overlap. Here the set member containing the most valuable data (e.g. largest study size or longest follow-up) was called the 'principal study', other members being 'subsidiary studies' only considered in meta-analyses where the required RR was unavailable from the principal study.

Data recorded
For each study, relevant information was entered onto a study database and a linked RR database. The study database contains a record for each study, describing relevant publications, sexes considered, age range, location, timing, length of follow-up, whether principal or subsidiary, overlaps or links with other studies, study design, populations studied, major study weaknesses, outcome definitions, numbers of cases and subjects, types of controls and matching factors used in case-control studies, confounding variables, and availability of results for each smoking index. The RR database holds the detailed results, typically containing multiple records for each study. Each record is linked to the relevant study and refers to a specific RR, recording the comparison made and the results. This record includes the outcome, the sex and the analysis type (prevalence or incidence). Smoking exposure is defined by status (ever, current or ex), product (any, cigarettes, cigarettes only) and similar information about the unexposed base. For dose-related indices, the level of exposure is recorded. The source of the RR is also recorded, as are details on adjustment variables. Results recorded include numbers of exposed and unexposed cases, and, for unadjusted results, numbers of exposed and unexposed members of the comparison group. The RR itself and its lower and upper 95% confidence limits (LCL and UCL) are always recorded, with the odds ratio chosen if available for a prevalence analysis and the relative risk (or hazard ratio if provided) for an incidence analysis. These may be as reported, or derived by various means (see below), with the method of derivation noted.

Identifying which RRs to enter
For each outcome RRs were entered relating to defined combinations of smoking index (major or dose-related), confounders adjusted for, and sex, as described below.

The major smoking indices
The intention was to enter RRs comparing current smokers, ever smokers or ex smokers with never smokers. Near-equivalent definitions were accepted when stricter definitions were unavailable, so that never smokers could include occasional smokers (or exceptionally, light smokers), while current smokers could include, and exsmokers exclude, those who quit smoking up to two years ago. If available, results were entered for five comparisons: any product vs. never any product, cigarettes vs. never any product, cigarettes only vs. never any product, cigarettes vs. never cigarettes, and cigarettes only vs. never cigarettes. Here "cigarettes" ignores whether other products (i.e. pipes and cigars) are smoked, while "cigarettes only" excludes mixed smokers.

Dose-related smoking indices
RRs were entered for five measures: amount smoked, age of starting, pack-years (cigarettes smoked per day times years of smoking, divided by 20), duration of smoking and duration of quitting. RRs were expressed relative to never smokers (or near equivalent), if available, or relative to non smokers otherwise. For duration of quitting, RRs were also expressed relative to current smokers. Further RRs were entered, restricted to smokers, and expressed relative to the level expected to have the lowest risk (e.g. lowest amount smoked, or longest time quit).

Confounders adjusted for
For prospective studies, results were entered adjusted for age and the greatest number of potential confounding variables for which results were available, and also adjusted for age only or age and the smallest number of confounders. Unadjusted results were only entered if no age-adjusted results were available. For other study types, results were entered adjusted for the greatest number of confounders, and also unadjusted (or adjusted for the smallest number of confounders). These alternative RRs are subsequently referred to as "mostadjusted" and "least-adjusted". For dose-related RRs restricted to smokers, results with "most adjustment" but without adjustment for other aspects of smoking were also entered if available.

Sex
Results were entered for males and females separately when available, with combined sex results only entered where sex-specific results were not available.

Derivation of RRs
Adjusted RRs and their 95% CIs were entered as provided, when available. Unadjusted RRs and CIs were calculated from their 2 × 2 table, using standard methods (e.g. [12]), noting any discrepancies between calculated values and those provided by the author. Sometimes the 2 × 2 table was constructed by summing over groups (e. g. adding current and ex smokers to obtain ever smokers) or from a percentage distribution. Various other methods were used as required to provide estimates of the RR and CI. The more commonly used methods are summarized below, fuller details being given in Additional file 1.
Correction for zero cell. If the 2 × 2 table has a zero cell, 0.5 was added to each cell, and the standard formulae applied. Combining independent RRs. RRs were combined over ℓ strata (e.g. from a 2 × 2 × ℓ table) using fixed-effect meta-analysis [13], giving an estimate adjusted for the stratifying variable. Combining nonindependent RRs. The Hamling et al method [14] was used (e.g. to derive an adjusted RR for ever smokers from available adjusted RRs for current and ex smokers, each relative to never smokers, or to combine adjusted RRs for several diseases, each relative to a single control or disease-free group). Estimating CI from crude numbers. If an adjusted RR lacked a CI or p-value but the corresponding 2 × 2 table was available, the CI was estimated assuming that the ratio UCL/LCL was the same as for the equivalent unadjusted RR.

Data entry and checking
Master copies of all the papers in the study file were read closely, with relevant information highlighted to facilitate checking. Where multiple papers are available for a study, a principal publication was identified, although details described only in other publications were also recorded. Preliminary calculations and data entry were carried out by one author and checked by another, and automated checks of completeness and consistency were also conducted. RR/CIs underwent validation checks ( [15]).

Selecting RRs for the meta-analyses
All meta-analyses are restricted to records with available RR and CI values. The process of selecting RRs for inclusion in a meta-analysis must try to include all relevant data and to avoid double-counting. For a given analysis (e.g. of current cigarette smoking), several definitions of RR may be acceptable (e.g. cigarette smoking, or cigarette only smoking), so, for studies with multiple RRs, the one to be used is determined by an order of preference defined for the meta-analysis. Orders of preference may be required for smoking status, smoking product, the unexposed base, and extent of confounder adjustment. As the definitions of RR available may differ by sex (e.g. a study may provide RRs for any product smoking for males, but for cigarette smoking for females), the most appropriate RR is chosen within each sex. Sexes combined results are only considered where sex-specific results are not available. Similarly RRs from a subsidiary study are only used where eligible RRs are unavailable from the principal study. When multiple orders of preference are involved, the sequence of implementation may affect the selection, so preferences for the most important aspects, usually concerning smoking, are implemented first.

Carrying out the meta-analyses
Fixed-effect and random-effects meta-analyses were conducted using the methods of Fleiss and Gross [13], with heterogeneity quantified by H, the ratio of the heterogeneity chisquared to its degrees of freedom. For all meta-analyses, Egger's test of publication bias [16] was also conducted.
A series of meta-analyses was conducted for each of the three main outcomes. For each meta-analysis conducted, combined estimates were made first for all the RRs selected, then for RRs subdivided by level of various characteristics, testing for heterogeneity between levels. These characteristics may include sex, continent, national cigarette type (blended, Virginia), start year of study, publication year, study type, lowest age included, highest age included, presence of study weakness, outcome subtype, how asthma was taken into account, use of a bronchodilator, study size (number of cases), analysis type (prevalence, onset), smoking product (any, cigarettes, cigarettes only), unexposed base (never any product, never cigarettes), smoking results available (ever smoking, current smoking, both), number of adjustment variables, whether the RR was adjusted for sex, age or for other factors, and how the RR and CI were derived. In this univariate approach, differences in fixed-effect estimates by level of a characteristic were tested for significance using an F-test which compared variation between and within levels of the characteristic considered. Additional file 1 fully defines the levels of each characteristic considered, and which characteristics are considered in each meta-analysis. It also details all the meta-analyses conducted, and describes the layout and notation used in the meta-analyses and associated forest and funnel plots.
For each selected outcome and exposure, separate meta-analyses were conducted based on most-adjusted and least-adjusted RRs.
For the major smoking indices, four broad types of meta-analysis were conducted: A ever smoking, B current smoking, C ever smoking (but using current smoking RRs if ever smoking RRs are not available) and D ex smoking. In each type, RRs for the "main analysis" were selected in the following order of preference: firstly for smoking of any product vs. never smoked any product, then for smoking of cigarettes (or of cigarettes only) vs. never smoked any product, and then for smoking of cigarettes vs. never smoked cigarettes, accepting RRs vs. near-equivalents to never smokers only when RRs vs. never smokers were unavailable. A variant analysis used a different order of preference, so that RRs for cigarette smoking were preferred. In type C meta-analyses, a further variant analysis preferred RRs for current smoking to those for ever smoking. Other variant analyses restricted attention to specific subtypes of outcome (e.g. for COPD, whether the definition was based on mortality, on lung function criteria only, or on other definitions).
For the dose-related indices, meta-analyses were conducted for: E amount smoked, F age of starting to smoke, G pack-years, H duration of smoking, I duration of quitting compared to never smokers (or long-term ex smokers), and J duration of quitting compared to current smokers (or short-term quitters). For any measure, a study typically provides a set of non-independent RRs for each dose-category, expressed relative to a common base. To avoid double-counting only one was included in any one meta-analysis. Two approaches were adopted. The first involves specifying a scheme with a number of levels of exposure ("key values"), then carrying out meta-analyses for each level in turn. For an RR to be allocated to a key value, its dose-category has to include that key value and no other. Schemes with a few, widely spaced, key values tend to involve RRs from more studies, whereas schemes with more key values, closely spaced, involve RRs from fewer studies, but ones with dose categories more closely clustered around the key value.

Meta-regression analyses of the major smoking indices
For COPD and CB meta-regression analyses were also carried out using the sets of RRs selected for the main meta-analyses for ever smoking and for current smoking. Following preliminary meta-regressions (not shown), a "basic model" was fitted which included eight categorical variables (sex, continent, outcome subtype, how asthma was taken into account, smoking product, unexposed base group, adjustment for age, and adjustment for factors other than age or sex) and also midpoint age, a continuous variable estimated from the age range of the population. The significance of each of these variables was estimated by an F-test based on the increase in deviance resulting from its exclusion from the basic model. A list of secondary variables was also defined (national cigarette type, publication year, study type, presence of a study weakness, use of a bronchodilator, study size, smoking results available for the study, method of derivation of the RR and CI and analysis type), with the significance of adding each characteristic to the basic model estimated by an F-test based on the increase in deviance. Alternative formulations of some basic variables were also tested; see also Additional file 1.

Additional analyses
For each outcome, and for ever smoking and current smoking, pairs of corresponding RR and CI estimates within the same study for males and for females were used to carry out meta-analyses of the sex ratio. Pairs of corresponding least-adjusted or most-adjusted RRs were also identified. Unlike the sex-specific pairs, these pairs were non-independent and the variance of their ratio cannot readily be calculated. Here the numbers of pairs where the most-adjusted/least-adjusted ratio exceeded or did not exceed 1 were compared by the sign test, with separate meta-analyses also conducted for the least-adjusted and most-adjusted members. Similar methods were also used to compare non-independent pairs of RRs for current smokers of cigarettes only and for current smokers of cigarettes ignoring other products.

Software
All data entry and most statistical analysis were carried out using ROELEE version 3.1 (available from P.N.Lee Statistics and Computing Ltd, 17 Cedar Road, Sutton, Surrey SM2 5DA, UK). Some analyses were conducted using Excel 2003.

Studies identified
Some 218 relevant studies were identified, based on information from 298 papers.
For the 2,150 papers rejected, reasons are summarized in Table 1, with further details of the searching, including a flow diagram, shown in Additional file 1. Many papers had multiple reasons for rejection, the counts in Table 1 relating only to the first listed reason which applied. A Reference Manager file is available on request which, for each rejected publication, gives its reference and the reasons for rejection. Table 2 presents selected details of the 218 studies  while Table 3 gives the distribution of their major characteristics. Additional file 2 gives fuller descriptions of the studies, including overlapping and linked studies, medical and other exclusions, detailed definitions of disease outcomes, and fuller distributions.
Of the 218 studies, 193 are classified as principal, 20 (10.4%) of these being case-control studies, 39 (22.7%) prospective, and 134 (69.4%) cross-sectional. The other 25 studies are classified as subsidiary. Ninety-three principal studies are of COPD only, 63 of CB only, nine of emphysema only, with 28 providing results for multiple outcomes. In total, information is available on COPD for 133 studies (116 principal), CB for 101 (87 principal) and emphysema for 28 (26 principal (4.1%) in South or Central America and seven (3.6%) elsewhere. Four (2.1%) were carried out in more than one of these areas. Of the 159 principal studies where the start year is given, 76 (47.7%) started before 1980. For 26 (13.5%) of the 193 studies a major study weakness is noted. Most commonly this is a failure to clarify, or to state at all, how study subjects were selected (studies ALESSA, ANDER3, COCCI, ITA-BAS, MOLLER, SHIMUR, ZIETKO). Other more commonly occurring weaknesses include use of unrepresentative samples which oversampled smokers (DEJONG, DETORR, JENSEN), those with respiratory disease (VOLLM1, VOLLM2) or those with occupational exposure (PETO, PRATT), and the use of controls that systematically differ from cases and controls in various ways (BROGGE, DEAN1, LUNDB2, STERLI). These weaknesses are described more fully in the footnotes to Table 2.
Most principal studies provide some results compared to never smokers, 146 (75.6%) for current smokers, 134 (69.4%) for ex smokers and 158 (81.8%) for ever smokers. Dose-response data are commonly available by amount smoked (77 studies, 39.9%) and by pack-years      ALESSA Small clinical study, not stated how subjects selected ANDER3 Small clinical study, not clear how controls were selected BROGGE More cases than controls were drawn from hospital sample (with hospitalisation for COPD in last 5 years) and their average age was 3.5 years older CHEN3 Analysis combines current smokers with those who gave up in last 5 years, and omits those who started smoking before age 13 or after age 22 COCCI Small clinical study, not stated how cases and controls were selected DEAN1 Cases occurred in 1969-72 while information on controls was collected in 1973. Cases were population sample but controls were household members only DEJONG Non-representative convenience sample particularly aimed at smokers DETORR Subjects were volunteers, invited from all smokers attending wards or clinics, so likely to have concomitant disease DONTA1 Inclusion of various lung diseases other than COPD in study endpoint, exclusion of subjects who died, emigrated or made dramatic changes to their smoking habits during follow-up DONTA2 Exclusion of subjects who died, emigrated or made dramatic changes to their smoking habits during follow-up FORAST Cases without symptoms in the last year were excluded HAWTHO Base for comparison includes smokers of up to 5 cigarettes per day HIGGI4 Because of inadequate detail in the report and use of differing age groups in different tables, estimates are rather speculative ITABAS Small clinical study, not stated how cases and controls were selected JAENDI Study population were those who visited their primary care physician, so may have been less healthy than the general population. Some attempts were made to contact patients who did not visit their physician during the study period, but it is unclear if they were then included in the study. It is not clear why only 7% of subjects were age 65+ JENSEN All subjects were participants in smoking cessation programme KHOURY 13% of sample were 1st degree relatives of COPD cases, and a further 3% were 1st degree relatives of lung cancer cases KLAYTO Base for comparison includes smokers of up to 5 pack-years KUBIK Base for comparison includes smokers of up to 3 cigarettes per day LUNDB2 A few subjects were analysed as controls (as determined at the start of the study) even if diagnosed with CB or asthma at the second phase of the study when the diagnosis category of the cases was determined MARAN1 Base for comparison includes smokers of up to 0.5 pack-years MARAN2 Base for comparison includes smokers of up to 0.5 pack-years MOLLER Small clinical study, not clear how subjects were selected NIEPSU Numbers of smokers not given for subset of participants undergoing spirometry (74%), therefore estimated using same proportions as whole study sample OMORI Different diagnostic techniques used for smokers OSWAL2 Base for comparison includes smokers of up to 5 cigarettes per day or who had smoked for less than 5 years PETO Three of the samples were drawn from mining areas with over 60% miners or other dusty jobs, implying about 40% of the overall sample were occupationally exposed.
PRATT Study contained small number of subjects who were cotton or tobacco farmers or who worked in tobacco factory RICCIO Subjects were recruited through a respiratory clinic but it is not stated whether they all had respiratory conditions. The definition of a smoker seems implausible SHIMUR Small autopsy study, not clear how subjects were selected STERLI All decedents proxy vs. none of living sample. Living sample 1 year later than decedents TAGER2 Age distribution for both men and women in study sample was significantly different from general population from which sample was drawn. Subjects who smoked but did not inhale were excluded TVERDA Includes acute bronchitis VOLLM1 Study population consists of volunteers who responded to extensive media advertising and cohort is biased towards those with respiratory disease, and analysis restricted to those with follow-up data. Subjects with abnormal baseline FEV were not invited to some phases so may have different follow-up rate VOLLM2 Study population consists of volunteers who responded to extensive media advertising and cohort is biased towards those with respiratory disease, and analysis restricted to those with follow-up data. Subjects with abnormal baseline FEV were not invited to some phases so may have different follow-up rate WALD Includes ICD9: 416 (chronic pulmonary heart disease) and 519 (other diseases of respiratory system) WIG Urban area is not a typical sample, as socio-economic status is above average ZIETKO Small clinical study, not clear how controls were selected Note that weakness is in respect of the current review, and is not a criticism of the original study which may have been designed with different objectives. g Study conducted in employed or occupational group: DOPICO outdoor workers for city and power company FLETCH men-postmen, women-clerical workers JOSHI employees at machine tool factory and woollen hosiery mill KLAYTO employees at two research facilities LAM1 employees in a machine factory SHARP clerical and light assembly workers at power company SUADIC armed forces, customs service, railway, telephone, post, banking and construction companies WAGEN2 heterogeneous population of employees from different companies and organisations WOOLF employees of large commercial firms h Study conducted in mixed groups: HAENSZ nationwide sample plus siblings of migrants to USA still resident in Norway HAWTHO occupational groups (from industry, not otherwise specified) and census-identified sample KATANC whites from Medicare, blacks from general population (58,30.1%), but less so by age of starting to smoke (17,8.8%), duration of smoking (12, 6.2%) or duration of quitting (18, 9.3%).
Of the 116 principal studies of COPD, outcome is based on ICD codes in 29 (25.0%), and lung function only in 59 (50.9%). The GOLD criteria are used in 27 (23.2%) studies, with MRC, ATS, ERS or BTS criteria used in 12 (10.3%). In 69 (59.5%) studies the subjects' asthma status is ignored, in 18 (15.5%) all asthmatics subjects are excluded, and in 14 (12.1%) the disease definition includes asthma. Only 19 (16.4%) of the 116 principal studies mention conducting spirometry after use of a bronchodilator. The outcome is based on prevalence in 79 (68.1%) principal studies, mortality in 28 (24.1%) and incidence in 10 (8.6%). In the principal studies, the median number of subjects is 2,033, and of cases 131 (range 13 to 32,822).
Of the 87 principal studies of CB, the outcome is based on symptoms (not lung function) in 59 (67.8%), and on ICD in only six (6.9%). Other studies use selfreport, a doctor diagnosis, or other definitions. The MRC questionnaire is used in 21 (24.1%). Asthmatics are excluded totally from six (6.9%) studies, with asthmatics excluded only from the controls in three (3.4%). The outcome is based on prevalence in 78 (89.7%) of the principal studies, mortality in six (6.9%) and incidence in three (3.4%). The median number of subjects is 2,826, and of cases 193.5 (range 2 to 4,769).
Of the 26 principal studies of emphysema, the outcome is based on visual comparison in 10 (38.5%), on diagnosis in seven (26.9%), on ICD codes in five (19.2%) and on other sources including self-report in four (15.4%). Asthmatics are excluded in two (7.7%) studies. The outcome is based on prevalence in 19 (73.1%) of the studies, on mortality in five (19.2%) and on incidence in two (7.7%). The median number of subjects is 2,433, and of cases 96.5 (range 2 to 1384).

Relative risks
A total of 3,538 RRs are entered, 1,578 for COPD, 1,689 for CB and 271 for emphysema, the number recorded per study varying from 1 to 211. Some 675 relate to subsidiary studies. Table 4 summarizes the distribution of various characteristics of the RRs by outcome, by study type for the principal studies, and overall. For fuller distributions of the RRs, referred to as necessary below, see Additional file 3.
Of the 2,863 RRs in principal studies, 67.8% relate to cross-sectional, 19.8% to prospective, and 12.4% to casecontrol studies. 81.2% of RRs are sex-specific. About half the RRs (52.0%) are adjusted for one or more variables. Of 1,488 adjusted RRs, age is adjusted for in 1,382 (92.9%) but only 490 (32.9%) are adjusted for variables other than age, sex or other smoking aspects. 34.0% of the RRs are given directly or calculated from a 2 × 2 or 2 × 2 × ℓ table, the rest being derived.
Of the 3,538 RRs, 1,439 are for major smoking indices, and 2,099 for dose-related indices (including 236 and 439 respectively in subsidiary studies). Of the 1,203 RRs in principal studies for major indices, 34.6% are for ever smoking, 37.8% current smoking and 27.6% ex smoking. 53.6% are for cigarette smoking ignoring other products, 33.8% any product smoking, and 12.6% cigarettes only. The unexposed group is typically never any product (55.8%) or never cigarettes (43.1%).
The distribution of smoking status for the 1,660 RRs in principal studies for dose-related indices differs considerably, with 22.8% for ever smoking, 59.6% current smoking and 17.6% ex smoking. Again, most (59.8%) RRs relate to cigarette smoking ignoring other products. The unexposed group is never smoking (any product or cigarettes) for 50.4% of these RRs, low smoking for 39.2%, and current smoking for 3.9%. 52.7% of RRs are for amount smoked, 8.1% age of starting, 19.8% packyears, 4.4% years duration, and 15.1% years quit (about half compared to never smokers or long-term quitters, the rest compared to current smokers or short-term quitters). Based on RRs with an unexposed base of never smoking, there are 174 sets of categorical data for amount smoked, 18 for age of starting, 52 for packyears, 11 for duration of smoking, and 26 for duration of quitting. For emphysema, there are few dose-related data other than for amount smoked None of the RRs included in the meta-analyses and meta-regressions show more than minor failures of the validation tests used, attributable to rounding errors or small imprecisions or uncertainties in estimating the RRs and CIs. Additional File 3 provides further detail.
KHOURY subjects were relatives of COPD cases (cases having been identified through Johns Hopkins Hospital respiratory laboratory), relatives of lung cancer and non-pulmonary patients, or community-based samples (neighbours and teachers) KIRAZ rural group using biomass cookers and urban group using fuel oil NAWA healthy workers/retired persons OSWAL1 cases were general clinic patients, and civil servants referred after repeated sickness due to bronchitis PETO occupational groups (transport and clerical workers) and census-identified sample REID general and migrants from UK and Norway TANG businessmen/professionals, civil servants, general population from socially deprived area, industrial workers WEN community cohort were volunteers invited for screening and comprised 25% of population in study areas; other cohort were civil servants and teachers in government employee insurance scheme   The meta-analyses and meta-regressions The main findings are summarized in the following sections, with tables and forest plots. Fuller results of the meta-analyses for the major smoking variables are given in Additional file 4 for COPD, Additional file 5 for CB and Additional file 6 for emphysema. Similar results for the dose-related smoking variables are given in Additional file 7 for COPD, Additional file 8 for CB and Additional file 9 for emphysema. An Excel file, available as Additional file 10, allows the user readily to view selected results from all these meta-analyses. Detailed meta-regression outputs are given in Additional file 11. For dose-related indices, Additional file 12 presents within-study plots of the doseresponse relationships, while Additional file 13 gives results that were originally presented in a form unsuitable for meta-analysis. The interested reader should first refer to Additional file 1, which describes the content and structure of all these Additional files.
A. Risk from ever smoking   Figure 1 Forest plot of ever smoking of any product and COPD-part 1. Table 5 presents the results of a main meta-analysis for COPD based on 129 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 1 and 2. The weights (inversevariance) are also shown numerically, expressed as a percentage of the overall weight. The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight. Arrows indicate where the CI extends outside the range allocated. Where the RR value falls outside the range, the size of the plotting symbol indicates the weight but the position is not true to the scale.   Figure 2 Forest plot of ever smoking of any product and COPD-part 2. This is a continuation of Figure 1, presenting the remaining individual study data included in the main meta-analysis for COPD shown in Table 5. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.  Figure 3 Forest plot of ever smoking of any product and CB-part 1. Table 5 presents the results of a main meta-analysis for CB based on 114 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). certain characteristics are shown in Table 5 Figure 4 Forest plot of ever smoking of any product and CB-part 2. This is a continuation of Figure 3, presenting the remaining individual study data included in the main meta-analysis for CB shown in Table 5. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.
consistency of direction is clear, with only one of the 129 COPD RRs, one of the 114 CB RRs, and none of the 28 emphysema RRs below 1.0. These estimates are little affected by preferring RRs for ever smoking cigarettes to those for ever smoking any product, the random-effects estimates changing to 2.92 (2.65-3.20) for COPD, 2.70 (2.50-2.91) for CB and 4.57 (3.40-6.15) for emphysema. This is partly due to many studies providing only one type of RR, so that for COPD, for example, 117 of the 129 RRs are common to both meta-analyses. Nor are they affected by preferring least-adjusted, rather than most-adjusted RRs, with the estimates now 2. 85  Returning to the main meta-analysis (most-adjusted and preferring ever smoking any product), there is also large variation between RRs in the weight they contribute to the analysis. For COPD, of a total weight of 5,116 for the 129 RRs (mean 39.7), the largest weight is 523 for study ZIELI2 for females, with six other RRs having weights of over 200. For CB, of the total of 6,146 for the 114 RRs (mean 53.9), the largest weight is 614 for study LAVECC for sexes combined, with eight other RRs having weights over 200. For emphysema, where the total weight is much lower, 489 (mean 17.5 for the 28 RRs), the weight of 241 for LAVECC for the sexes combined RR contributes almost a half.
In investigating sources of heterogeneity, variation was studied firstly using a univariate approach, the results for the characteristics considered in Table 5 Figure 5 Forest plot of ever smoking of any product and emphysema. Table 5 presents the results of a main meta-analysis for emphysema based on 28 relative risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The weights (inverse-variance) are also shown numerically, expressed as a percentage of the overall weight. The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight. Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.    Variation by other characteristics (see Additional file 10) was also studied. For no outcome is there any clear evidence that RRs varied by the type of tobacco (blended or Virginia) typically used in the country where the study was conducted, by the lowest, or highest, age of subjects included in the study, by presence of the study weaknesses defined in Table 2, by whether the outcome was assessed using a bronchodilator (only relevant to COPD), or by whether the RR was directly available, derived from 2 × 2 tables provided, or using more complex methods. Differences are seen by start year of the study, but, like publication year, they do not follow any clear pattern over time. For emphysema, estimates are higher (p < 0.001) for studies providing RRs only for ever smoking than studies providing RRs for both ever smoking and current smoking, with random-effects estimates, respectively, 5.51 (4.08-7.43, n = 11) and 3.77 (2.63-5.42, n = 17). Sexes combined RRs tend to be lower if adjusted for sex for COPD (p < 0.05) and emphysema (p < 0.001), but not for CB. RRs adjusted for more factors tend to be lower for COPD (p < 0.01), CB (p < 0.01) and emphysema (p < 0.001). This is unsurprising given the results already noted for adjustment for age and for factors other than age or sex.
For COPD, the relationship to the characteristics was also studied separately for three subtypes of outcomebased on mortality (31 RRs), on lung function (62 RRs) and on other definitions (42 RRs). The tendency for RRs to be higher for North American studies is clearest when outcome is based on mortality, also evident when based on lung function only, but not evident when based on other definitions. The relationship of risk to study type cannot usefully be studied as nearly all relevant mortality studies are prospective, and nearly all other studies are cross-sectional. Similarly most data from mortality studies are of onset, whereas most data from other studies are of prevalence. The higher RRs noted in the overall results for smoking of cigarettes only are also evident solely in the mortality studies, as no RRs for this exposure are included for the other COPD subtypes. There is, however, a consistent tendency for all subtypes for RRs to be higher when the comparison group is never smoking of any product than when it is never smoking of cigarettes.
As only three CB RRs based on mortality are included, the relationship to the characteristics for CB is only studied separately for two subtypes-outcomes based on symptoms (83 RRs), and other than on mortality or symptoms (28 RRs). Tendencies noted in Table 5 for RRs to be higher in males than females, lower if adjusted for age than if unadjusted, and lower if the unexposed base group is never cigarettes than if it is never any product, are apparent for both subtypes.
For emphysema, the relationship to the characteristics separated by subtype of outcome cannot usefully be studied due to limited numbers, with four of the 28 RRs being based on mortality, and 24 based on other definitions.
In an attempt to evaluate the independent role of the characteristics, meta-regression analyses were conducted for COPD and CB, the results from the basic model being summarized in Table 6. There are too few data points for emphysema for useful meta-regression analysis, especially since almost half the total weight comes from one study (LAVECC).
For COPD the deviance reduces from 1,038.1 on 128 degrees of freedom to 421.8 on 112 degrees of freedom on fitting the basic model, substantially reducing, but not eliminating, the heterogeneity. The results in Table  6 demonstrate an independent role of six characteristics noted in the univariate analyses: sex (lower RRs for females), continent (higher for North America), smoking product (higher for cigarette smokers than smokers of any product), the unexposed base (higher for never any product than never cigarettes), and particularly the outcome subtype (lower when based on lung function), and the way asthma is taken into account (lowest when asthma is included in the COPD definition). Effects of adjustment and of age are not clearly seen, however. For none of the secondary characteristics do their inclusion into the model significantly improve the fit (at p < 0.05). This includes study type and analysis type, which are highly significant (p < 0.001) in the univariate analyses shown in Table 5. Both these are highly correlated with outcome subtype-thus where mortality is the outcome, the study type will nearly always be prospective, and the analysis type will nearly always be onset.
Inspection of standardized residuals from the basic model for COPD reveals two estimates where the observed RR differ markedly from the RR fitted by the model. The largest residual of -3. 49  For CB the deviance reduces from 657.1 on 113 degrees of freedom to 433.3 on 103 degrees of freedom on fitting the basic model, again substantially reducing, but not eliminating, the heterogeneity. Though the direction of differences by level of the various characteristics is quite similar to that for COPD, the effects of individual characteristics are less clear, with significant differences (at p < 0.05) only for continent, how asthma was taken into account, and age-adjustment. No secondary characteristics help to improve the model fit (at p < 0.05), except for publication year, where a tendency is seen for earlier published studies to provide higher RRs.   Table 7.
As for ever smoking, the RRs for COPD, CB and emphysema are heterogeneous (p < 0.001), with the largest seen being 43. For the main meta-analysis, the studies contributing the most to the total weight are the same as for the corresponding meta-analysis for ever smoking: ZIELI2/ females for COPD (11.7% of the total of 4,226), and LAVECC/sexes combined for CB (11.4% of 4,326) and emphysema (61.9% of 287).
For the characteristics considered in Table 7 the pattern of variation seems quite similar to that for ever smoking in Table 5. Thus, as for ever smoking, RRs tend to be higher for males and for North American studies for all three outcomes, higher for prospective studies for COPD, and higher when based on mortality for COPD and CB, with no obvious variation by study size, and an erratic pattern for publication year. RRs also show a similar pattern by how asthma is taken into account for COPD to that seen for ever smoking, and are again higher when based on onset for COPD, higher for cigarette only smoking for COPD, higher when the unexposed group is never smoked any product for COPD, and lower for RRs unadjusted for age for CB. As for ever smoking, variation in RRs by other characteristics (not shown in Table 7) was also studied. For most of these there seems little evidence of any difference. For COPD, there is a tendency (p < 0.001) for estimates to be higher for studies providing RRs only for current smoking than for studies providing RRs for both ever smoking and current smoking, with random-effects estimates, respectively, 4.52 (2.69-7.59, n = 10) and 3.40 (3.00-3.87, n = 110), but no such differences are seen for CB and emphysema. Compared to the results for ever smoking, there seems less clear evidence of an effect of adjustment, except as already noted for adjustment for age for CB (Table 7).
For COPD, the relationship to the characteristics was also studied separately for outcomes based on mortality (33 RRs), based only on lung function (58 RRs) and based on other definitions (36 RRs). As for ever smoking, risk is higher in North American studies when the outcome is based on mortality or lung function, but not when based on other definitions. Also as for ever smoking, and for reasons stated in the previous section, variation cannot usefully be studied by study type, or by analysis type (onset or prevalence), or in relation to smoking of cigarettes only. Again RRs are consistently higher for all the outcome subtypes when the comparison group is never smoking of any product than when it is never smoking of cigarettes.
As only four CB RRs based on mortality are included, the relationship to the characteristics for CB is only studied separately for outcomes based on symptoms (81 RRs) and based other than on mortality or symptoms (28 RRs). The tendency noted in Table 7 for RRs to be higher for North American studies is only evident when outcome is based on symptoms, but the tendency for RRs to be lower if adjusted for confounders seems evident in both groups.
As is the case for ever smoking, the relationship to the characteristics by outcome subtype cannot usefully be studied for emphysema due to limited numbers, with only four of 28 RRs based on mortality.
Also as for ever smoking, meta-regression analyses are conducted for COPD and CB, the results from the basic model being summarized in Table 8.
For COPD the deviance reduces from 1,643.4 on 119 degrees of freedom to 433.3 on 103 degrees of freedom on fitting the basic model. The results in Table 8 Figure 6 Forest plot of current smoking of any product and COPD-part 1. Table 7 presents the results of a main meta-analysis for COPD based on 120 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 6 and 7 Figure 7 Forest plot of current smoking of any product and COPD-part 2. This is a continuation of Figure 6, presenting the remaining individual study data included in the main meta-analysis for COPD shown in Table 7. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.  Figure 8 Forest plot of current smoking of any product and CB-part 1. Table 7 presents the results of a main meta-analysis for CB based on 113 relative risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 8 and 9. The weights (inversevariance) are also shown numerically, expressed as a percentage of the overall weight. The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight. Arrows indicate where the CI extends outside the range allocated.
product, the unexposed group, outcome subtype, and the way asthma is taken into account. A significant effect (p < 0.05) of age is also seen. No secondary variable significantly improves the fit (at p < 0.05  Figure 9 Forest plot of current smoking of any product and CB-part 2. This is a continuation of Figure 8, presenting the remaining individual study data included in the main meta-analysis for CB shown in Table 7. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. age, and how asthma is taken into account. As for ever smoking, the largest standardized residuals are for males in study GOLDBE (-3.44) and females in study JOUSI1 (-2.88).

C. Risk from ever or current smoking
In an attempt to incorporate data from all the studies (except those with only dose-related data), additional analyses were carried out. The first set of analyses uses results for ever smoking if available from a study, but if not results for current smoking. Conversely, the second set prefers results for current smoking to results for ever smoking where both are available. The meta-analysis RRs are shown in Table 9. The RRs are intermediate between those for ever smoking (Table 5) and those for current smoking (Table 7). For example for COPD, random-effects estimates are 2.89 (95%CI 2.63-3.17) specifically for ever smoking, 3.00 (2.71-3.32) preferring ever smoking, 3.46 (3.07-3.90) preferring current smoking, and 3.51 (3.08-3.99) specifically for current smoking. As so many of the RRs are common between the specific ever smoking analyses in Table 5 and the analyses preferring ever smoking in Table 9 the pattern of RRs by level of the characteristics studied tends to be quite similar. The same is true for the specific current smoking analyses and the analyses preferring current smoking in Table 9. Results for ever or current smoking by level of selected characteristics are therefore only presented in the Additional files.      For the main meta-analysis, the studies contributing the greatest to the total weight are ZIELI2/females for COPD (11.9% of the total of 3,510), and LAVECC/sexes combined for CB (13.1% of 2,493) and emphysema (48.4% of 300).

D. Risk from ex smoking
For the characteristics considered in Table 10, the pattern of variation is quite similar to that for current smoking seen in Table 7. Thus, for COPD, RRs are, for both current and ex smoking, higher in males, lower in European studies, lower in cross-sectional studies, higher where the outcome is mortality, higher for cigarette only smoking and higher for never any product as the unexposed base. For CB, RRs are higher for mortality for both current and ex smoking, but the differences by continent seen for current smoking are not evident for ex smoking. The same is true for differences by age-adjustment (not shown in Table 10). The small numbers of emphysema RRs for ex smoking (17) preclude reliable study of variation by level of the characteristics of interest. Further details of variations in RRs by level of the characteristics for all three outcomes, overall and (for COPD and CB) by outcome subtype are given in the Additional files. a Within each study, results are selected in the following order or preference, within each sex, for: unexposed group-never any product, never cigarettes, other; smoking product-any, cigarettes (ignoring other products), cigarettes only; overlapping studies-principal, subsidiary; and then for single sex results in preference to combined sex results. Results adjusted for the most potential confounders are selected. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P B = probability value for between levels (see methods) similarly expressed. c Includes acceptable near-equivalent estimate (see methods) if estimate for strictly defined never smoker base not available (COPD: 3 for never cigarettes, CB: 2 for never any product and 4 for never cigarettes). These results, which also include some meta-analyses by level of selected characteristics, show that the increase with amount smoked is also clearly evident using an alternative set of key values (1,10,20,30,40,999), though numbers of available RRs are quite sparse for the higher values, and using least-adjusted rather than most-adjusted RRs. The additional files also include available results for some other studies which present dose response data in a form that cannot readily be included in the meta-analyses (e.g. comparison of mean or median consumption in cases and non-cases). These results do not appear inconsistent with those summarized in Table 11.

F. Risk by age of starting to smoke
There is rather limited evidence available for age of starting, with only 10 studies for COPD, three for CB and one for emphysema providing data usable in metaanalyses. Table 12 summarizes the meta-analysis results. Random-effects RRs for earliest compared to latest starting are significantly elevated for COPD (1.49, 1.26-1.76, n = 14) and CB (2.08, 1.29-3.35, n = 6), but not for emphysema (1.14, 0.70-1.88, n = 2). The increase in risk with earlier starting seen for COPD is consistent with the results of the key value analyses, with randomeffects estimates rising to 3.12 (2.07-4.70, n = 8) for categories containing 14, but not 18 years.  (1,10,20,30,999), and using least-adjusted rather than most-adjusted RRs. The additional file also summarizes results for quite a number of other studies presenting dose response data in a form that cannot readily be meta-analysed. Many of these reported a significantly increased risk with increasing pack-years.

H. Risk by duration of smoking
Evidence for duration of smoking that can be used in meta-analyses is only available for three studies for COPD, three for CB and two for emphysema. Table 14 summarizes the results of the meta-analyses, which for CB and emphysema are based on heterogeneous data.

I. Risk by duration of quitting (vs. never smoking)
The number of studies providing usable data for duration of quitting compared to never smoking is seven for COPD, and seven for CB, but none for emphysema. As shown in Table 15, there is some evidence of higher risks in short-term quitters for COPD, with the shortest vs. longest random-effects meta-analysis estimate 2.21 (1.24-3.94, n = 10) and a tendency for estimates to be lower for the longer-term quitters in the key value analysis, though the trend is not monotonic. For CB, evidence of higher risks in short-term quitters is less convincing, with the shortest vs. longest estimate of 1.25 (0.99-1.59, n = 11) not significant, and RRs varying little by key value. The results are limited by the variability of the categories used by different studies to classify duration of quitting. This makes it difficult to find a key scheme which includes sufficient numbers of studies across the range. For instance, for COPD, the key scheme shown in Table 15 includes only three RRs at the two shorter levels, whereas an alternative set of key values (20, 12 and 3 years, shown in the Additional files) incorporates only three RRs at the two longer levels. a Within each study, results are selected in the following order of preference, within each sex, for: smoking status-ever, current or current, ever according to analysis; unexposed group-never any product, never cigarettes, other; smoking product-any, cigarettes (ignoring other products), cigarettes only; overlapping studies-principal, subsidiary; and then for single sex results in preference to combined sex results. Results adjusted for the most potential confounders are selected. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom.  Figure 11 Forest plot of ex smoking of any product and COPD-part 1. Table 10 presents the results of a main meta-analysis for COPD based on 110 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 11 and 12  For duration of quitting compared to current smoking, data are available from one less study than for duration of quitting compared to never smoking for COPD, but from the same studies for CB. The longest vs. shortest analysis shown in Table 16 is the inverse of the shortest vs. longest analysis in Table 15. The key value analyses are based on a limited number of RRs but are consistent with the association declining with longer-term quitting. For categories including 12, but not 7, years quitting random-effects meta-analysis RRs relative to current smoking are 0.52 (0.37-0.71, n = 9) for COPD and 0.65 (0.41-1.04, n = 9) for CB.

Further analyses based on within-study differences
Some studies provide independent RRs for males and females for the same definition of outcome and exposure. Random-effects meta-analysis of the male/female sex ratio for current and ever smoking for each outcome confirm the impression already gained from the analyses  Figure 12 Forest plot of ex smoking of any product and COPD-part 2. This is a continuation of Figure 11, presenting the remaining individual study data included in the main meta-analysis for COPD shown in Table 10. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.  Figure 13 Forest plot of ex smoking of any product and CB-part 1. Table 10 presents the results of a main meta-analysis for CB based on 105 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale in Figures 13 and 14. The weights (inverse-variance) are also shown numerically, expressed as a percentage of the overall weight. The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight. Arrows indicate where the CI extends outside the range allocated. Where the RR value falls outside the range, the size of the plotting symbol indicates the weight but the position is not true to the scale.
shown in Tables 5 to 8 that RRs are somewhat higher for males, though again the difference is not always statistically significant. For ever smoking, the meta-analysis RRs of the sex ratio are 1. Some studies also provide separate non-independent least-adjusted and most-adjusted RRs for the same definition of exposure. There is little evidence that adjustment reduces the RR for ever or current smoking. For ever smoking, using the same preferences as in the main meta-analyses (Figures 1, 2, 3, 4 and 5), the mostadjusted estimate is lower than the least-adjusted estimate for 14 of the 30 (46.7%) pairs for COPD, for 18 of the 41 (43.9%) pairs for CB, and for 2 of the 5 (40.0%) pairs for emphysema. For current smoking the corresponding numbers are 11/26 (42.3%) for COPD, 16/36 (44.4%) for CB and 2/3 (66.7%) for emphysema. In no case do the percentages differ from 50% (at p < 0.05), and in each case the random-effects meta-analysis estimate based on the least-adjusted pair members is similar to the corresponding estimate based on the mostadjusted pair members (data not shown).  Figure 14 Forest plot of ex smoking of any product and CB-part 2. This is a continuation of Figure 13, presenting the remaining individual study data included in the main meta-analysis for CB shown in Table 10. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI. After excluding studies with no pipe or cigar smokers, some studies allow comparison of RRs of the risk of current smoking vs. never smoking for cigarette smokers ignoring other products with equivalent RRs for cigarette only smokers. These estimates are non-independent. For 7 of the 9 pairs of RRs for COPD, for all 6 of the pairs for CB (p < 0.05) and for both the pairs for emphysema the RR is lower for cigarette smokers ignoring other products. However the RR ratio is never markedly different from 1, ranging from 0.78 to 1.13 for COPD, from 0.84 to 0.99 for CB, and from 0.86 to 0.96 for emphysema.
RRs for a dose-related index of smoking may be adjusted for other such indices. However, this is only at all common for age of starting to smoke, where adjustment for amount smoked is carried out in five of the 10 studies providing data for COPD, and in one of the three providing data for CB. It is not possible to assess the effect of adjustment for amount smoked, as three of the six relevant studies provide the adjusted RR and no other RR, and the other three provide only adjusted and totally unadjusted RRs.
For all three outcomes, Egger's test [16] shows significant evidence of publication bias for both ever smoking (COPD p < 0.001, CB p < 0.05, emphysema p < 0.01) and current smoking (COPD p < 0.05, CB p < 0.001, emphysema p < 0.05). Figures 16 (COPD), 17 (CB) and 18 (emphysema) show funnel plots for ever smoking. All the plots give an impression of there being more lowerweight RRs above the mean and more higher-weight RRs below the mean.

Evidence of a relationship
The meta-analyses carried out demonstrate a clear relationship of smoking to all three outcomes considered-COPD, CB and emphysema. This is evident for ever, current and ex smoking, and for outcomes based on mortality, lung function, symptom prevalence or other methods. That this relationship is causal is supported by the evidence of a dose-response, risk increasing with amount smoked and pack-years for all three outcomes, and (based on more limited data) decreasing with increasing age of starting to smoke for COPD and CB, and with increasing duration of quitting for COPD. It is also supported by the similarity of results based on most-adjusted and least-adjusted RRs, and by withinstudy comparisons showing that additional confounder adjustment little affected estimates for the same exposure definition.  Figure 15 Forest plot of ex smoking of any product and emphysema. Table 10 presents the results of a main meta-analysis for emphysema based on 17 relative risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or cigarettes if any product not available). The individual study estimates are shown numerically and graphically on a logarithmic scale. The weights (inverse-variance) are also shown numerically, expressed as a percentage of the overall weight. The studies are sorted in order of sex within study reference (REF). In the graphical representation individual RRs are indicated by a solid square, with the area of the square proportional to the weight. Arrows indicate where the CI extends outside the range allocated. Also shown are the combined random-effects estimates. These are represented by a diamond of standard height, with the width indicating the 95% CI.   Between levels P B < 0.05 < 0.05 NS a Within each study, results are selected in the following order or preference, within each sex, for: unexposed group-never any product, never cigarettes, other; smoking product-any, cigarettes (ignoring other products), cigarettes only; overlapping studies-principal, subsidiary; and then for single sex results in preference to combined sex results. Results adjusted for the most potential confounders are selected. b n = number of estimates combined, F = fixed-effect meta-analysis RR (95% CI), R = random-effects meta-analysis RR (95% CI), H = heterogeneity chisquared per degree of freedom, P H = probability value for heterogeneity expressed as p < 0.001, p < 0.05, p < 0.1 or NS (p ≥ 0.1), P B = probability value for between levels (see methods) similarly expressed. c Includes acceptable near-equivalent estimate (see methods) if estimate for strictly defined never smoker base not available (COPD: 3 for never cigarettes, CB: 2 for never any product and 4 for never cigarettes).

Heterogeneity
The studies are remarkably consistent in reporting an increased risk in ever smokers. Only two of the 271 RRs for the three outcomes combined considered in Figures 1, 2 and 3 are less than 1.0. However, studies also vary markedly in the magnitude of the estimated RR, as illustrated by the high values of H seen in the meta-analyses of the major smoking indices, which often exceed 5 and sometimes exceed 10. (H values of 5 and 10 are the same as I 2 values [17] of 80% and 90%). This is unsurprising given the many sources of variation involved, including sex, location, timing, study design and populations, definition of outcome and exposure, type of product smoked, and extent of confounder adjustment.
Using univariate and multivariate (meta-regression) methods, we investigated variation in risk by a number of characteristics of the study and the RR. For each outcome no characteristic on its own explains a major part of the variation, and substantial excess heterogeneity remains even after fitting multivariate models. However, differences in the strength of the association with smoking by level of some characteristics are apparent, these differences being quite similar for each outcome and each major smoking index. RRs tend to be higher for North American studies, for males, and for cigarette smoking than smoking of any product. For COPD RRs are substantially higher for studies of mortality or onset, especially those where the definition of COPD excludes asthma, and lower where the definition is lung function based. Studies of mortality are less common for CB or emphysema, but also give relatively high estimates. Effects of some other characteristics, such as study timing and study type, though significant in some univariate analyses, are not significant with the multivariate approach. As some characteristics are correlated (e.g. mortality studies are often prospective, US studies are more often prospective than elsewhere, and studies using lung function criteria are commonly cross-sectional) it is not straightforward to identify underlying effects. However, we feel that the main meta-regression models for ever smoking ( Table 6) and current smoking (Table 8) for COPD and CB are useful in explaining some of the heterogeneity, their usefulness being confirmed by the fact that adding in further characteristics did not significantly improve prediction. Particularly for COPD, the metaregressions show there are many characteristics that independently modify the risk estimates. Meta-regressions were not tried for emphysema, where there were

Sex
If possible, sex-specific results are included in the metaanalyses, with combined sex results included only if not. Though variation by sex was not significant in all the main analyses, risk estimates generally tended to be higher for males than females. This is supported by additional analyses comparing RRs within-study for the same outcome and exposure definition. The higher RRs for males do not necessarily indicate any greater susceptibility, and seem more likely to reflect increased smoking. We note that some publications (e.g. [18][19][20]) have suggested that women may have a greater susceptibility than men to the effects of smoking on COPD or lung function, but others (e.g. [21][22][23]) have suggested the opposite. A detailed overall assessment of this aspect is beyond the scope of this paper.

Age
In the meta-regressions a continuous variable was included that indicated the midpoint of the age group to which the RR applied. The fitted coefficient was always positive, but significant (at p < 0.05) only for current smoking for COPD. Note that for each study only RRs for the whole age range were entered, though the availability of age-specific data was recorded. Proper assessment of the relationship of age to the RRs for the different outcomes would require entry and analysis of these further data. For the present, the data can only be regarded as indicating that RRs for studies in older populations may be greater than those in younger populations.

Location
The meta-regressions showed significant variation in risk by continent, mainly due to higher RRs for North American studies. Similar differences are seen in the univariate analyses for emphysema, and also for ex  smoking (except for CB). This difference is not readily explained, but it could relate to differences in diagnosis not fully accounted for by the model, in amount smoked, or in type of product smoked. However a variable accounting for the predominant long-term use of blended cigarettes in some countries (including the US), and of flue-cured Virginia cigarettes in others (including the UK and Canada), did not significantly predict risk.

Study timing
In the univariate analyses of ever and current smoking RRs varied significantly by when the study was published, but the pattern was erratic, with no trend. Study timing did not, however, add predictive power to the multivariate models. This suggests that differences between the periods studied are correlated with differences in other study characteristics. The term COPD has only been widely used in the last 25 years or so, and definitions based on lung-function have been changing, so there may well be differences by time in the nature of outcomes we classified as COPD. There have also been changes in the nature of the product smoked, with reducing tar deliveries of cigarettes and declining use of pipes and cigars [24].

Definition of the disease outcome
For all RRs meta-analysed, the outcome had to be CB specifically, emphysema specifically or COPD generally. Thus each RR applied only to one outcome. The term COPD is quite recent, so data from some earlier studies which might legitimately have been included may have been excluded or entered against the wrong outcome. Some early studies described their outcomes as CB. If they supported their definitions by ICD codes incorporating all the core components of COPD, we reclassified the outcome as COPD. However, where ICD codes were not given, we left the outcome as CB, though we suspect  that sometimes the outcome might better have been COPD. For COPD, the definitions allowed vary considerably, and the cases may not represent a homogeneous set. Thus population-based cross-sectional studies using lung function criteria alone probably include cases with less severe disease than studies in hospitals or using mortality records. Most prospective studies of incidence made no attempt to trace deaths, so may have omitted more rapidly progressive cases. We have not studied variation in risk in those few studies presenting results by severity of disease. Similar considerations apply to CB and emphysema, though less strongly, partly because there were fewer studies of mortality.
For COPD RRs are higher when the definition was based on mortality than when based on lung function or other criteria. Compared to RRs based on lung function, the meta-regressions indicated that RRs based on mortality are about 1.5 times higher for both ever and current smoking. The tendency for RRs based on mortality to be higher is also seen for CB and emphysema, but based on fewer studies.
For COPD RRs also clearly vary by how asthma was taken into account. For most studies, co-existing asthma was ignored (i.e. diagnosis was made independent of asthma, and both cases and non-cases could include asthmatics). However there were some (mainly mortality) studies where asthma is part of the outcome definition (e.g. COPD = CB, emphysema or asthma). Here, usually only the underlying cause of death is considered, so the possibility of a CB or emphysema case also being recorded as having asthma does not arise. RRs are much lower for these studies. For others, asthmatics had been totally excluded, and RRs tend to be intermediate.

Study type
For COPD particularly, the univariate analyses show a tendency for RRs to be higher for prospective studies than for other designs. Study type did not contribute in multivariate analyses, probably reflecting its strong correlation with disease outcome definition, prospective studies tending to present mortality results, but other study types tending to use lung function, symptoms or other criteria.

Aspects of smoking
For COPD the meta-regressions show significant variation by smoking product, with RRs highest for smokers of cigarettes only, lowest for smokers of any product,  Figure 16 Funnel plot for ever smoking and COPD. Funnel plot of the 129 relative risk estimates for ever smoking and COPD included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.
and intermediate for smokers of cigarettes (ignoring other products). As the estimates for cigarette only smokers depended largely on just two large studies (HAMMO2, THUN), we further investigated the difference between smokers of cigarettes only and smokers of cigarettes by within-study comparisons. This confirmed the tendency for cigarette only smokers to have higher risks. Though we have not considered data for smoking of pipes and cigars only, the results are consistent with a greater effect of smoking cigarettes than other products. Smokers of any product include some who smoke no cigarettes at all, while smokers of cigarettes include some who smoke cigarettes and pipes/cigars and who are likely to smoke less cigarettes per day than smokers of cigarettes only. For CB and emphysema there are few RRs for cigarette only smokers, but these also suggest higher risks for this group. For COPD, the results show a higher RR where the unexposed group is never any product than when it is never cigarettes. This is consistent with the absolute risk being higher where the unexposed group includes some smokers (of pipes/cigars), than where it does not. However, this pattern is not seen for CB and emphysema.
We investigated the dose-response relationship by meta-analyses for five exposure measures-amount smoked, age of starting, pack-years, duration of smoking, duration of quitting (both vs. never smokers and vs. current smokers).
Meta-analysis of RRs expressed relative to never smokers or relative to current smokers is hampered by the different categories used by different studies to define level of exposure, so we also analyzed RRs comparing extreme levels of exposure within smokers, an approach allowing all studies to be included (including those only presenting analyses for smokers). For all three outcomes, risk increases with amount smoked and pack-years. For COPD and CB earlier starters have significantly higher risks, and risk also tended to decrease with longer-term quitting. Data are too few for emphysema to make inferences for age of starting and duration of quitting. The only measure showing no dose-relationship is duration of smoking but data are very limited. Note that all the outcomes are chronic diseases and disease presence may affect smoking habits. Depending on when smoking habits are recorded, this may bias downwards associations with these dose-related measures.

Derivation of RRs
About a third of RRs used in meta-analyses are available from the source or can be derived directly from crosstables of exposure by outcome. Otherwise more complex methods had to be used to derive the required RR.  Figure 17 Funnel plot for ever smoking and CB. Funnel plot of the 114 relative risk estimates for ever smoking and CB included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.
It was reassuring that whether or not the RR was derived did not add predictive power to the main metaregression model, suggesting that use of derived RRs caused no material bias.

Effect of studies with high RRs or large weight
The statistical analyses investigated the role of various characteristics on the estimated risk of the three outcomes in relation to smoking, but did not formally test the effect of exclusion of specific studies with extreme RRs or large weights. For ever and current smoking, we have noted the highest RRs and those contributing most to the total weight. For COPD and CB, where each analysis involves over 100 most-adjusted RRs, no single RR contributes more than 12% of the total weight, and the distribution of RRs and of standardized residuals from the meta-regression models did not suggest any single RR had an undue influence. For emphysema, the situation is different. There are fewer RRs, only 28 for ever smoking and 22 for current smoking, and one study (LAVECC) contributes substantially to the overall weight (49% for ever, 62% for current) while having a relatively low RR (2.05 for ever, 1.76 for current). Furthermore, study AUERBA, which does not provide an RR for ever smoking, has a strikingly large RR of 489.54 for current smoking. We therefore investigated the effect of exclusion of these studies on the combined current smoking RR, where the problem is most severe (Table 17). It can be seen that exclusion of AUERBA substantially reduces the random-effects estimate, while exclusion of LAVECC substantially increases the fixed-effects estimate. Both exclusions, particularly AUERBA, reduce the heterogeneity substantially.
Why should the estimates vary so much? LAVECC was a large national health survey in Italy, in which 437/22, 376 (2.0%) male and female current smokers of any product and 595/44, 172 (1.3%) male and female never smokers of any product reported they had emphysema or respiratory insufficiency, with no independent check on the diagnosis. AUERBA involved an examination of whole-lung sections prepared from lungs removed at autopsy, with 816/839 (97.3%) male current cigarette smokers and 20/176 (11.4%) male never smokers of any product diagnosed as having minimal, slight, moderate, advanced or far advanced emphysema. These percentages differ widely between the two studies and reflect differences in  Figure 18 Funnel plot for ever smoking and emphysema. Funnel plot of the 28 relative risk estimates for ever smoking and emphysema included in the main meta-analysis in Table 5 against their weight (inverse-variance of log RR). The dotted vertical line indicates the fixed-effect meta-analysis estimate.
what is considered emphysema. Someone interviewed in a survey would be unaware of lower grades of emphysema. For AUERBA it is possible to derive RRs for higher grades of emphysema. For instance, restricting attention to advanced or far advanced emphysema reduces the rate in the male smokers to 134/839 (16.0%), and in never smokers to zero, so still indicating an extremely high RR.
We also compared the results reported by AUERBA with those reported in the other autopsy studies (ANDER2, PRATT, RYDER and SUTINE), although only results for ever smoking are available in those studies, PRATT being of males and the other studies of both sexes combined. Among never smokers of any product, rates of emphysema (ANDER2 30/51 = 58.8%, PRATT 15/97 = 15.5%, RYDER 21/73 = 28.8%, SUTINE 28/73 = 38.4%) are all much higher than reported by LAVECC and also higher than reported by AUERBA. Among ever smokers of any product (cigarettes only for ANDER2), rates of emphysema (ANDER2 = 89.5%, PRATT = 42.0%, RYDER = 75.5%, SUTINE = 69.2%) are again much higher than reported by LAVECC but clearly lower than reported by AUERBA. While it is clear that emphysema rates based on autopsy studies are much higher than those based on surveys, (and also than those based on mortality studies, data not shown), the very high RR seen in AUERBA is due to a far greater discrimination between smokers and never smokers than seen in other autopsy studies. These results emphasise the problem of heterogeneity in deriving combined estimates.

Representativeness
We excluded studies of populations with a co-existing medical condition, with clearly atypical smoking habits (e.g. cocaine users or residents of a homeless shelter), or with clearly atypical risk (e.g. alpha-1 antitrypsin deficiency). Thus most studies include subjects broadly representative of the general population. Some studies had eligibility criteria such as long-term residence, household residence (excluding residents of institutions or military personnel) or telephone subscribers, criteria that may have resulted in underrepresenting subjects with lower SES or more mobile lifestyles. A few studies involved patients attending their physician or clinics, who may have been less healthy than average. It seems unlikely that any of these effects would have materially affected the relationship between smoking and COPD.
Studies of subjects with a high occupational risk for respiratory disease were excluded. The classification of high risk was based on our educated judgment, and not formally tested. Low occupational risk studies included in this report involved armed forces personnel, doctors, nurses, teachers, civil servants, professional and businessmen, coffeehouse and shop workers, postal, telephone, transport and clerical workers, and outdoor workers, as well as persons working in specific factories, research facilities, or unspecified industry.
Some studies included were originally designed along clinical or experimental rather than epidemiological lines, and subject selection was unclear. These studies are generally small, and any non-representativeness would little affect our results.

Other sources of bias
It is well known that researchers are more likely to wish to publish, and editors more likely to accept for publication, studies finding a statistically significant association between exposure and disease. The published literature may therefore overstate any true association or produce a false-positive relationship. There is some formal evidence of publication bias, with Egger's test suggesting bias in a number of the meta-analyses (see Figures 10 to 12). While some small studies showing no association may never have been published, large studies are likely to publish, and it is these which contribute most to the meta-analyses. We have not attempted to quantify bias, as formal methods are all based on assumptions which cannot be tested, but it seems doubtful whether publication bias is a serious issue.
Another possible source of bias is misclassification of smoking status. Random misclassification would dilute the association, as would any tendency for cases to deny or understate their smoking more than for the general population. Any tendency for current smokers to claim to be ex smokers, as might happen in a study conducted in a clinical setting or where patients have been advised to stop Table 17 Investigating the effect of excluding a study with a very large weight, (LAVECC) and/or a study with a very high RR (AUERBA) on the meta-analysis, estimate for current smoking for emphysema smoking, would tend to inflate the risk for ex smoking. Not only may misclassification rates vary by aspects of the study design and the way questions are asked, they may also vary by sex, age or other demographic variables. The meta-analyses were conducted by combining direct estimates of the RR (from prospective studies) with ORs (from case-control and cross-sectional studies and occasionally from prospective studies). ORs somewhat overestimate relative risks where the disease is not rare [25], but here the overestimation is of little practical importance. Based on unadjusted data from prospective studies, where one could calculate both the relative risk and the OR, we estimate that the median bias from using the OR would have been only 1.01 for COPD and emphysema, and 1.04 for chronic bronchitis.

Limitations
This review has various limitations, many unavoidable. Lack of access to individual subject data limits the ability to carry out meta-analyses using similar exposure indices and confounder adjustment throughout, but obtaining such data was not feasible given many studies were conducted years ago. Obtaining a reliable definition of outcome and exposure is often hindered by incomplete information in the source papers. This review is also to some extent limited by restricting attention only to stratification by sex, and not attempting to record RRs subdivided by age or other characteristics. We also limited attention to specific indices of smoking, for example not entering data on pipe or cigar smoking, filter/plain smoking, or tar level. However we have recorded the availability of such extra information, and further work incorporating such data may give more insights. The procedures conducted for this review were extremely timeconsuming and it was impractical to bring the literature included fully up-to-date. However consideration of data from 218 studies published between 1953 and 2006 should give a reliable enough picture.

Conclusions
After excluding studies conducted in children or adolescents, or in populations at high respiratory disease risk or with co-existing diseases, we identified, from papers published between 1953 and 2006, 218 studies which relate one or more of a defined set of smoking indices to COPD, CB and emphysema. One hundred and thirtythree of the studies provide relevant data for COPD, 101 for CB and 28 for emphysema.
One major conclusion is that for each outcome the RRs for a given smoking index were markedly heterogeneous.
Another conclusion is that estimates are clearly elevated for all three outcomes. Individual study RRs virtually all exceed 1.0, and based on random-effects metaanalyses of most-adjusted RRs, estimates are elevated for ever smoking (COPD 2.89, CI 2.63-3.17, n = 129 RRs; CB 2.69, 2.50-2.90, n = 114; emphysema 4.51, 3.38-6.02, n = 28), current smoking (COPD 3.51, 3.08-3.99, n = 120; CB 3.41, 3.13-3.72, n = 113; emphysema 4.87, 2.83-8.41, n = 22) and ex smoking (COPD 2.35, 2.11-2.63, n = 110; CB 1.63, 1.50-1.78, n = 105); emphysema 3.52, 2.51-4.94, n = 17). The consistency and strength of the relationships are consistent with a causal relationship. A causal relationship is supported by the fact that estimates are not materially affected by adjustment for confounding variables, and by the evidence of a doseresponse relationship, with risk increasing with amount smoked and pack-years for all three outcomes and (based on more limited data) risk decreasing with increasing starting age for COPD and CB and with increasing quitting duration for COPD.
Our review also provides evidence that various characteristics of the study and RR affect risk estimates. For COPD, RRs are higher for males, for studies conducted in North America, for cigarette smoking rather than any product smoking, where the unexposed base is never smoking any product, and are markedly lower when asthma is included in the COPD definition. Variations by sex, continent, smoking product and unexposed group are in the same direction for CB, but less clearly demonstrated. For all outcomes RRs are higher when based on mortality, and for COPD are markedly lower when based on lung function.
This comprehensive review provides further insight into the relationship of smoking to COPD, CB and emphysema.

Additional material
Additional file 1: Methods. .DOC file giving a fuller version of the Methods section than in the paper [320][321][322][323][324]. Particular topics described in more detail include the following:• the rules for preferring one outcome definition to another where a study provides multiple qualifying alternatives, and giving the outcomes selected and alternatives not used for these studies. It also gives details of core and allied conditions for each of the three outcomes, and the definitions of COPD based on published criteria of lung function.• the literature searching, including a flow chart.• the methods by which RRs and CIs were derived, where required, from the data presented in the source papers.• the statistical analyses conducted. It does not include any results itself, but describes the content and structure of additional files 4 to 13 that do provide detailed statistical results.
Additional file 2: Studies. .DOC file concerning the 218 studies included on the database. This describes which studies provided data for which outcome and gives details of the overlapping and linked studies, as well as fuller distributions of study characteristics than those given in the paper and also details of study populations and exclusions. For each of the three outcomes, a study by study description of the full definition of the outcome and source of diagnostic information is given.
Additional file 3: RRs. .DOC file concerning the RRs included on the database. This gives the numbers of RRs per study as well as fuller distributions than those given in the paper of the characteristics of the RRs for the major smoking indices and the dose-response indices, and of the characteristics of the sets of RRs for the dose-response indices. It also death. Based on the recommendations of the eighth revision conference, 1965, and adopted by the nineteenth world health assembly. Volume 1.