This single-center, retrospective study was approved by the institutional review board. The requirement for patients’ informed consents was waived.
Study population
We retrospectively included patients from a single tertiary referral hospital if they met the following inclusion criteria: (a) patients with a body temperature ≥ 38.3 °C or a sustained body temperature ≥ 38.0 °C for an hour between January and December 2017 [13]; (b) patients with absolute neutrophil count (ANC) < 500/mm3 in peripheral blood on the day of fever [13]; and (c) patients who underwent CXRs on the day of fever.
In cases of multiple CXRs from a single patient, only the first CXR was included to represent the episode of FN. However, if there was an afebrile period of 30 days or longer between the days of fever, they were regarded as different episodes of FN and the first CXR in each episode was included.
CXR acquisition and CAD system
All CXRs were obtained using a fixed radiography system (Digital Diagnost, Philips Healthcare, Eindhoven, The Netherlands) or a portable radiography scanner (DRX-revolution, Carestream Health, Rochester, NY), depending on patients’ condition. CXRs from a fixed radiography system were obtained in the erect position with posteroanterior projection, while CXRs from a portable radiography scanner were obtained in the supine or sitting position with anteroposterior projection.
We used a commercialized, regulatory-approved deep learning-based CAD (Lunit INSIGHT CXR 2, version 2.0.0.0, Lunit Inc., Seoul, Korea) to evaluate CXRs. The CAD was designed to detect radiographic abnormalities, including pulmonary nodules or masses, pulmonary infiltrates, and pneumothorax on a single frontal CXR. The CAD was initially trained using 54,221 normal CXRs and 35,613 abnormal CXRs including 6,903 CXRs from patients with pneumonia (see supplementary material for further information) [21].
The CAD provided probability scores for the presence of each target abnormality between 0 and 100% for a CXR. When the probability score was 15% or greater, the CAD also provided a heat map overlaid on the CXR for the localization of the abnormality (Fig. 1). In the present study, we used only probability scores for pulmonary infiltrate (see supplementary material for further information).
A CXR of a 68-year-old woman with febrile neutropenia (absolute neutrophil count, 366/mm3; body temperature, 39.1 °C) showing increased opacity in the right upper lung (A, arrow). A chest computed tomographic scan obtained on the same day showing the corresponding consolidation and ground-glass opacity lesions in the upper lobe of the right lung, suggestive of pneumonia (B). The CAD system correctly identified the abnormality with a probability score of 84% (C). In the reader test, three of five radiologists correctly identified the pulmonary infiltrate on the CXR in the radiologist-alone interpretation, and all five radiologists identified the lesion in the interpretation with CAD.
A CXR of a 69-year-old man with febrile neutropenia (absolute neutrophil count, 489/mm3; body temperature, 38.9 °C) showing subtle increased opacity in the right lower lung (D). A chest computed tomographic scan showing the corresponding consolidation and ground-glass opacity lesions in the lower lobe of the right lung, suggestive of pneumonia (E). The CAD system correctly identified the abnormality with a probability score of 51% (F). In the reader test, two of five radiologists correctly identified the pulmonary infiltrate on the CXR in the radiologist-alone interpretation, and four radiologists including two radiologists who initially could not recognize the lesion correctly identified the lesion in the interpretation with CAD.
Reference standard
Two thoracic radiologists (C.M.P. and E.J.H., 21 and 9 years of experience in interpreting CXR and chest CT) who were blinded to the CAD results reviewed patients’ medical records and radiological, microbial, and laboratory examination results to define, in consensus, the clinical diagnosis of pneumonia at the time of CXR. The presence of clinical features suggestive of respiratory infection with demonstrable pulmonary infiltrate by CXR or CT was regarded as a clinical diagnosis of pneumonia, regardless of the microbiological identification of the causative pathogen [23]. When a pulmonary infiltrate was identified on CT but invisible on CXR, such case was defined as positive for pneumonia based on CT findings.
Reader test
To compare the performance of CAD with that of radiologists and to evaluate whether CAD can enhance the performance of radiologists, we conducted a retrospective reader test. Five board-certified radiologists (general radiologists without subspecialty training for thoracic radiology, 1–3 years of experience after finishing residency) participated in the reader test, and 50% of CXRs among the entire cohort were randomly sampled for the test.
Each radiologist independently interpreted CXRs to classify them into those with or without pulmonary infiltrates suggestive of pneumonia. First, radiologists read the CXRs without CAD results (radiologist-alone interpretation). After finishing the first reading session for all CXRs, they re-interpreted the CXRs with the corresponding CAD results and were allowed to change their initial decision as needed (interpretation with CAD). The radiologists were informed that all CXRs were obtained from FN patients; however, they were blinded to other clinical or laboratory information.
Subgroup analyses
For a more solid reference standard for the presence of pulmonary infiltrates, we separately evaluated the performance of CAD in patients with available chest CT obtained within 3 days from the CXRs with reference to CTs for the presence of pulmonary infiltrates.
To investigate the performance of CAD in patients with different clinical characteristics, we evaluated the performance of CAD in the following subgroup populations: (a) male vs. female patients; (b) patients aged < 60 years vs. ≥ 60 years; (c) CXRs from a fixed radiography system vs. CXRs from a portable radiography scanner.
Statistical analyses
The discriminative performance of CAD (ability to separate CXRs with and without pneumonia) was evaluated using the area under the receiver operating characteristic curve (AUC). The sensitivity and specificity of CAD were determined at the pre-defined threshold (probability score of 15%) recommended by the manufacturer. A comparison of AUCs of CAD between subgroups was performed according to the method by DeLong [24], while the chi-square test was used to compare the sensitivities and specificities of CAD between subgroups.
To compare the performance between radiologists and CAD, the sensitivity and specificity of CAD at the pre-defined threshold were compared with those of individual radiologists using generalized estimating equations, to consider the clustering effects caused by multiple CXRs obtained from a single patient and multiple interpretations by radiologists for a single CXR [25]. The sensitivities and specificities of radiologists in radiologist-alone interpretation and interpretation with CAD were also compared to investigate whether or not CAD can enhance radiologists’ detection performance.
Calibration (degree of agreement between the predicted probabilities by CAD versus the observed probability of pneumonia) of CAD was evaluated by constructing a calibration plot. Inter-reader agreement among radiologists was evaluated using the Fleiss’ kappa coefficient.
All statistical analyses were performed using R (version 3.6.3, R project for statistical computing, Vienna, Austria). A P value of less than 0.05 was considered to indicate a statistically significant difference.