Systematic review of studies investigating ventilator associated pneumonia diagnostics in intensive care

Background Ventilator-associated pneumonia (VAP) is an important diagnosis in critical care. VAP research is complicated by the lack of agreed diagnostic criteria and reference standard test criteria. Our aim was to review which reference standard tests are used to evaluate novel index tests for suspected VAP. Methods We conducted a comprehensive search using electronic databases and hand reference checks. The Cochrane Library, MEDLINE, CINHAL, EMBASE, and web of science were searched from 2008 until November 2018. All terms related to VAP diagnostics in the intensive treatment unit were used to conduct the search. We adopted a checklist from the critical appraisal skills programme checklist for diagnostic studies to assess the quality of the included studies. Results We identified 2441 records, of which 178 were selected for full-text review. Following methodological examination and quality assessment, 44 studies were included in narrative data synthesis. Thirty-two (72.7%) studies utilised a sole microbiological reference standard; the remaining 12 studies utilised a composite reference standard, nine of which included a mandatory microbiological criterion. Histopathological criteria were optional in four studies but mandatory in none. Conclusions Nearly all reference standards for VAP used in diagnostic test research required some microbiological confirmation of infection, with BAL culture being the most common reference standard used. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-021-01560-0.


Background
Ventilator-associated pneumonia (VAP) refers to inflammation of the lung parenchyma caused by infectious agents acquired specifically while receiving invasive mechanical ventilation [1,2]. VAP is a preventable nosocomial complication which potentially contributes to avoidable mortality and morbidity [3,4]. Therefore, it is considered a clinically and epidemiologically important measure of the quality of care [5,6]. It contributes to additional resource consumption, adding time and expense to an intensive care stay, accounting for a large proportion of all antibiotic prescriptions [7]. VAP is considered to be responsible for an additional cost of approximately $40,000 per episode in the US [8,9] and around £9000 in the UK [10]. The contribution of an episode of VAP to mortality is difficult to definitively ascertain because of the high number and severity of confounders amongst the at-risk population [4,[11][12][13]. This attributable mortality has been reported from high to neutral or near-neutral [11][12][13].
Throughout recent decades investigators have not adopted a fixed set of criteria or a fixed definition for VAP [14]. This lack of a reference standard has led to an inability to make comparisons across study sets and uncertainty about VAP incidence [15]. The incidence of VAP varies widely in different studies depending on the diagnostic criteria used, type of intensive therapy unit (ITU), and patient population [16,17].
Existing literature reports that the incidence of VAP varies widely between 4.0% and 28.8% of the at-risk population [8,[18][19][20][21][22][23][24], with an event rate between 1.4 and 16.5 per 1000 ventilator days [1,[25][26][27]. As VAP rates have become an important quality indicator, the Centre for Disease Control and the European Centre for Disease Control use their own precise case definitions to identify VAP events [28,29]. Both definitions return similar VAP rates making them adequate for surveillance purposes and benchmarking of critical care units internationally [1]. However, due to the lack of concordance between these two definitions, they do not make ideal reference standards [1,28], and further highlight the difficulty in achieving consensus in diagnosing VAP. Microbiological samples, especially quantitative culture of bronchoalveolar lavage (BAL), are considered to be integral to the diagnosis of VAP [30,31]. However, a systematic review of diagnostic methods in 2008 found that microbiological methods did not contribute to the accuracy of diagnosis over clinical criteria and all respiratory sampling methods were equivalent [32]. The continuing lack of an agreed reference standard hampers research into novel diagnostic methods. The aim of this review was to identify what reference standards have been used in diagnostic evaluation research for VAP.

Methods
The protocol for this review was published in PROS-PERO (International Prospective Register of Systematic Reviews) under registration CRD42019125449 [33].

Search strategy
A comprehensive search strategy was developed by one of the authors (BA). The Cochrane Library, Pub-Med (MEDLINE), CINHAL, EMBASE, and web of science were electronically searched from January 2008 until November 2018. We limited our search to studies published after 2008 following a comprehensive systematic review of diagnostic methods [32]. Medical Subject Headings (MeSH) and search terms were used to interrogate the databases. The 3 concepts used for the searches were VAP, diagnostics, and ITU (for search terms see Additional file 1). No restriction on publication language was applied. In addition, electronic searching of Google and hand searching through an examination of the reference list of the published articles were also used to identify additional publications (an example of MEDLINE search is provided in Additional file 1).

Review strategy
All records were independently reviewed by the lead author (BA) and another author (PM or JG) and disagreement was resolved by a third independent adjudicator (PM or JG). Initially, titles and abstracts review of all records, then full-text reviews were conducted against the inclusion/exclusion criteria. Studies included in the review fulfilled the following criteria: (1) adult ventilated patients of any gender, (2) ITU settings, (3) suspected VAP as defined in this study (after 48 h on the ventilator), (4) focused on the diagnostic procedures of VAP (clinical markers, biomarkers, chest x-ray, chest ultrasound (U/S), lung biopsy, BAL and mini-BAL, protected specimen brush (PSB), blind PSB, Endotracheal Aspirate (ETA)). Studies were excluded from the review if they: (1) were animal studies, (2) included patients under the age of 18 years old, (3) focused on the surveillance of VAP, (4) compared the diagnosis of VAP against another illness diagnostic, (5) were feasibility studies, (6) included participants who were already diagnosed with VAP, (7) investigated VAP treatment effectiveness by monitoring biomarkers or other diagnostics, (8) evaluated risk factors to predict VAP, (9) were procedures used to predict the mortality in VAP, (10) were case-controlled studies. All papers that passed the full-text review and those that had some diagnostic technical terms were examined by an ITU clinician (THC) to confirm their clinical relevance to the research question.

Quality assessment and data extraction
A team of 12 reviewers (systematic reviewers, clinicians, methodologists, health economists) from the University of Northumbria, Newcastle University, The Newcastle Upon Tyne Hospitals NHS Foundation Trust, and the University of Edinburgh were involved in the quality assessment and data extraction process. All included papers were quality assessed and the data were extracted by two authors independently. Any disagreement was discussed between both reviewers in the first instance. The further disagreement was resolved by a third reviewer. The quality assessment scoring checklist was adopted from the Critical Appraisal Skills Programme (CASP) checklist for diagnostic studies [34], which is one of the well-recognised methodological quality or risk of bias assessment tools for primary and secondary medical studies [35,36] and has been used to assess the quality of diagnostic studies in systematic reviews [37][38][39]. The quality assessment scoring checklist contains 8 questions from the overall 12 questions in the CASP checklist. Questions from section C in the CASP for diagnostics checklist "will the results help locally?" were not included in our scoring as the main aim of the review was not related to the local application of the diagnostic procedures. Studies were assigned a score of '1' for each item of the checklist if they were considered to meet the aspect of this item and '0' if not. A total score for each study was calculated by summing the item scores. The maximum possible final score was 8. Any study that scored '0' for the first or the second question or scored less than '5' out of 8 in total was excluded. According to CASP guide for diagnostic studies, if the answer to question 1 or 2 while critically appraising a study was "no", then it is not worth continuing. That leaves 6 questions out of the total 8 we used in our quality assessment. Taking in consideration that these questions are equally as important but less important than the first 2 question, we determined that a study must fulfil the quality of at least half of these 6 points (score 3 out of 6) to be consider for the review. Therefore, this threshold was derived through reviewer consensus that studies scoring less than 5 out of 8 were not of sufficient quality to adequately address the research question.
A standardised data extraction form was developed by three authors (AJA, BA, THC) and reviewed by all authors (for quality assessment and data extraction form see Additional file 2). We recorded and present study country of origin, study size, male: female ratio or enrolled participants, index test(s) under investigation, reference standard used to define VAP, and test characteristics. Although test characteristics for the index test are not relevant to the aims of this review, we present them herein because several of the index tests are also used as reference standards. Test characteristics are taken directly from the studies or calculated using data contained within the studies. Where multiple test characteristics are presented in the original paper, we selected those highlighted by the original authors or those which reflect the comparison best, or those which indicate the best performance. Where BAL was conducted, we recorded the details of the lavage procedure.
A narrative data synthesis approach was used to report the results from reviewed studies. Due to the large variation in practice, processes, and reference standards, a meta-analysis of diagnostic accuracy was not conducted.

Studies identified
The searches identified a total of 2441 articles. Records that were not published in English were translated to English using Google translator. 2263 articles were excluded on the basis of title and abstract and a further 123 on the basis of full-text screening were excluded as not clinically relevant to the inclusion criteria or meeting at least one of the exclusion criteria, leaving 55 articles for quality assessment (see Fig. 1 for PRISMA flow chart).

Quality assessment and data extraction
Of the 55 studies examined in the quality assessment stage, 11 studies were excluded due to either scored '0' for the first or the second question or scored less than '5' out of '8' in total score, leaving 44 studies included in this review . All scored were agreed by at least two reviewers and reviewed by the principal investigator (PI). The lowest score assigned to any included study was '5' out of '8' . Three studies scored 8/8, 24 studies scored 7/8, 11 studies scored 6/8, and six studies scored 5/8 (see Table 1).
As expected, all papers suffered from bias in their accuracy estimates of the index test from the use of an imperfect reference standard comparison, a wellknown issue with comparative diagnostic accuracy studies [84]. The results of the quality assessment reviews conducted using the form adopted from CASP diagnostic study checklist showed that five papers [51,54,57,62,82] [47,65]. The methodology description was not described in detail in three papers [40,45,82] and the results of the study were not clearly presented in five papers [46,64,67,69,75]. There was a lack of certainty regarding the results of the study on 11 occasions [40,42,43,46,58,64,65,67,68,74,75]. CASP diagnostic study checklist guide was followed in assessing all points.

Discussion
To the best of our knowledge, this is the first and most comprehensive systematic review aiming to evaluate the reference standard tests used to evaluate novel index tests for suspected VAP since the publication of Rea-Neto and colleagues systematic review of diagnostic methods in 2008 [32]. We reviewed papers comparing a novel index test against a chosen reference standard to identify what reference standards have been used in diagnostic evaluation research for VAP. To deliver a high-quality systematic review, we excluded papers with a high risk of bias and all papers included in this review fulfil at least 5 out of the 8 criteria we included from the CASP checklist.
The microbiological culture was the sole or a component criterion in the vast majority of studies. Overall, the culture of BAL fluid was the most common reference standard, with the most common growth threshold being > 10 4 CFU/mL. This was occasionally used in combination with another reference standard, such as the demonstration of BAL cells with intracellular organisms exceeding 2% of the total number of cells. Composite reference standards incorporating a variety of existing clinical scores, existing surveillance definitions, radiological assessments, clinical parameters, and microbiological methods including culture were used in the remaining studies. A large variation in practice, processes, and reference standards were detected, highlighting the inconsistency in the current diagnosis of VAP and making a meta-analysis of diagnostic accuracy challenging. Biological, clinical, and statistical heterogeneity makes comparisons across the different studies difficult and subjective. We display a variable and generally good quality of the papers, and the review provides an indication of what has been and is being done in this area globally with respect to the use of reference standard in the diagnostics of VAP. The line between composite criteria and a sole microbiological criterion was often blurred. Many studies in the sole microbiological criterion group had strict objective clinical and radiological enrolment criteria. Where these criteria are applied pre-enrolment and therefore applied to both index tests and reference standards we have not incorporated them into a description of the reference standard.
A key question in diagnostic accuracy research when reference standards are imperfect is whether the reference standard used to assess novel diagnostics should be 'more inclusive' (higher sensitivity, lower specificity) or 'less inclusive' (lower sensitivity, higher specificity). Using microbiological criteria alone exhibits good face validity but risks missing cases of 'true VAP' or including false positives through contamination (although prior specification of clinically suspected VAP reduces this risk). Importantly, both possibilities are potentially strongly influenced by operator technique/expertise, especially for BAL; this contrasts with diagnostics reliant on blood sampling or imaging. BAL culture was the most common microbiological method found in this review. The use of BAL culture is potentially problematic for several reasons. Firstly, in a recent systematic review, when compared to the reference standard of histopathological examination of lung tissue, BAL culture had a sensitivity of 71.1% and specificity of 79.6% [86] echoing previous findings that microbiological examination does not correlate well with histopathological examination [32]. Secondly, the timing and nature of prior antibiotic therapy may adversely affect sample positivity [87,88], although this problem is conceivably solved by incorporating a criterion addressing percentage of host cells containing invading organisms, a measure not affected by prior antibiotic therapy [75]. Thirdly, the BAL procedure itself is not standardised, and the requirements for sample collection are not uniform. Whilst this may have little impact on bacterial growth, a fact confirmed by one of the included studies [55], the variety of studies that utilised bronchial discard may plausibly lead to a variety in sensitivity at detecting the causative pathogenic organism. Sole microbiological criteria also risk introducing cases of 'false VAP' through contamination [87], although this risk is reduced by using distal or protected specimens. Of relevance, the quality and consistency of BAL procedures are likely to be higher in studies than during routine clinical practice, which could further influence its validity.
Using composite criteria may conceivably address the problem of missing cases of 'true VAP' , and the number or thresholds of additional criteria is not limited. Additional criteria can be made mandatory to increase specificity or made optional to increase sensitivity. Some studies in this review rely on existing surveillance definitions for VAP or use their own composite standards. The existing surveillance definitions were designed to objectively and reproducibly monitor VAP rates not to identify true VAP in a robustly sensitive and specific manner, although as a quality indicator face validity amongst clinicians is important. Other composite studies incorporated radiological assessments into the reference standard. It has been shown that chest x-ray changes are not considered integral to the diagnosis by many clinicians [89], that the performance characteristics of chest x-ray may not meet the requirements as a diagnostic standard [90][91][92], and that inter-and intra-observer variability is high in chest x-ray assessment [93,94]. These issues mean that incorporation of radiology into any novel reference standard should be undertaken with caution. Many studies incorporate clinical signs which plausibly reduces the risk of false positives, and although this makes physiological sense there is minimal evidence to support this. Klompas et al. showed, in the development of the novel CDC VAP surveillance algorithm, that deterioration in oxygenation after a period of stability was associated with clinically important outcomes but the addition of other clinical measures such as abnormal temperature, abnormal white blood cell count, or purulent secretions was not [95]. However, a lack of correlation with clinically important outcomes is not the same as a lack of correlation with a true diagnosis of VAP; this issue is relevant when the decision based on the test relates to a therapy (antibiotic use) rather than prognosis.
No studies relied upon histopathological diagnosis of VAP to confirm the diagnosis. This is not surprising for practical reasons: it cannot be routinely and safely undertaken in all patients with suspected VAP either at the time of the index test or later. Histopathological analysis may also be inaccurate due to sampling artefacts, the lack of representation of a small piece of tissue, and displacement in time from the period of peak infection. It is not possible to provide certainty about the appropriate reference standard in diagnostic evaluation research for VAP following this systematic review, which simply identifies the methods chosen by researchers and confirms the lack of a standardised approach. Researchers must decide whether it is more important to be 'more inclusive' or 'less inclusive' , and future comparisons may wish to employ the strategy deployed by one of the studies in this review [61]: using a graded certainty of VAP from possible to probable to definite using a composite definition.
There are three main limitations to our review. Firstly, in order to be diagnosed with VAP, a patient must be at risk of VAP, and there is no standard definition for patients at risk. For the purposes of this study, we defined those at risk of VAP as those who have undergone more than 48 h of mechanical ventilation. Secondly, many included studies enrolled only patients with suspected VAP, and this means many listed reference standards must be prefixed with "clinically suspected VAP". This level of clinical suspicion was not systematically collected by us. This is particularly noteworthy in considering the reference standards listed in Table 3. Thirdly, although data extraction for this review was completed before the impact of Coronavirus Disease 2019 (COVID-19), the pandemic nonetheless interfered with the delivery time of this review.

Conclusion
BAL culture with a microbiological growth threshold of > 10 4 CFU/mL is the commonest reference standard used to examine the utility of a novel index test for VAP amongst patients who are at risk for and clinically suspected of VAP. Composite reference standards were used in approximately 25% of reviewed studies. Nearly all reference standards for VAP identified in this review required some microbiological confirmation of infection. The studies identified in this review highlight the need for a standardised approach to diagnosis VAP which may include the development of a data-driven composite reference standard from large cohort studies.