Visual classification of three computed tomography lung patterns to predict prognosis of COVID-19: a retrospective study

Background Quantitative evaluation of radiographic images has been developed and suggested for the diagnosis of coronavirus disease 2019 (COVID-19). However, there are limited opportunities to use these image-based diagnostic indices in clinical practice. Our aim in this study was to evaluate the utility of a novel visually-based classification of pulmonary findings from computed tomography (CT) images of COVID-19 patients with the following three patterns defined: peripheral, multifocal, and diffuse findings of pneumonia. We also evaluated the prognostic value of this classification to predict the severity of COVID-19. Methods This was a single-center retrospective cohort study of patients hospitalized with COVID-19 between January 1st and September 30th, 2020, who presented with suspicious findings on CT lung images at admission (n = 69). We compared the association between the three predefined patterns (peripheral, multifocal, and diffuse), admission to the intensive care unit, tracheal intubation, and death. We tested quantitative CT analysis as an outcome predictor for COVID-19. Quantitative CT analysis was performed using a semi-automated method (Thoracic Volume Computer-Assisted Reading software, GE Health care, United States). Lungs were divided by Hounsfield unit intervals. Compromised lung (%CL) volume was the sum of poorly and non-aerated volumes (− 500, 100 HU). We collected patient clinical data, including demographic and clinical variables at the time of admission. Results Patients with a diffuse pattern were intubated more frequently and for a longer duration than patients with a peripheral or multifocal pattern. The following clinical variables were significantly different between the diffuse pattern and peripheral and multifocal groups: body temperature (p = 0.04), lymphocyte count (p = 0.01), neutrophil count (p = 0.02), c-reactive protein (p < 0.01), lactate dehydrogenase (p < 0.01), Krebs von den Lungen-6 antigen (p < 0.01), D-dimer (p < 0.01), and steroid (p = 0.01) and favipiravir (p = 0.03) administration. Conclusions Our simple visual assessment of CT images can predict the severity of illness, a resulting decrease in respiratory function, and the need for supplemental respiratory ventilation among patients with COVID-19.


Background
Since the first outbreak of coronavirus disease 2019 (COVID- 19), numerous patients have been admitted to the hospital with respiratory symptoms. Clinical manifestations of COVID-19 range from asymptomatic or mild upper respiratory tract disease to severe interstitial pneumonia with respiratory failure, requiring oxygen support and intubation [1][2][3][4][5]. More than a year after the worldwide COVID-19 outbreak, there are no signs of the pandemic abating.
Numerous clinical and imaging markers of disease severity have been reported. Among imaging markers, computed tomography (CT) provides the most sensitive radiological technique for the diagnosis of COVID-19, revealing diffuse lung alterations, ranging from groundglass opacity to consolidation [5][6][7][8][9][10]. In addition, different radiological lung patterns are manifested over the course of the disease. Research using quantitative methods to evaluate lung CT images to derive various image analysis scores has suggested the possibility of predicting the severity of the disease [10][11][12][13][14]. However, there are limited opportunities in clinical practice to use these quantitative image analysis approaches [11,12,14]. To address this limitation, we recently conducted a singlecenter retrospective study of patients with COVID-19 at St Luke's International Hospital, Tokyo, to investigate if a simple visual assessment of CT images could predict the severity of COVID-19. We identified three patterns of pneumonia findings based on visual analysis of CT images, which could be associated with disease severity. Our primary objective in this study was to investigate the reliability of the classification of these patterns and their utility in predicting the clinical outcomes of COVID-19.

Study oversight
This retrospective study was approved by the institutional review board of St. Luke's International Hospital (20-R220). Informed consent was waived owing to the retrospective study design. We used our hospital's electronic medical record database to retrospectively identify 224 consecutive patients who had been admitted to the hospital with a diagnosis registered as "COVID- 19" or "suspected of COVID-19" from January 2020 to September 2020. Only patients with a COVID-19 diagnosis confirmed by reverse transcription-polymerase chain reaction were selected. Patients who did not have a CT scan at admission, as well as those with a history of lung resection, were excluded. Patients with no abnormal lung findings on CT were also excluded. Ultimately, 69 consecutive patients with COVID-19 were included in our analysis (Fig. 1).

CT protocols
In each patient, the whole lung CT was performed under static conditions during an end-inspiratory hold whenever possible. CT imaging was performed using either a 256-detector scanner (Revolution CT, GE Healthcare) or 64-detector scanners (Optima CT660, GE Healthcare). The parameters for CT examinations performed on the Revolution CT unit were as follows: 120-kV tube voltage, 50 to 650 mA tube current, 80 mm collimation, 0.992 pitch, 320 mm field of view (FOV), and 512 × 512 matrix. Parameters for the Optima CT660 unit were as follows: 120-kV tube voltage, 50-560 mA tube current, 40 mm collimation, 0.984 pitch, 320 mm FOV, and 512 × 512 matrix. An unenhanced scan was obtained for all patients. The dose length product was 525.54 ± 290.7 mGy-cm, with a volume CT dose index of 11.21 ± 3.75 mGy.

CT findings
Pulmonary opacities were classified into peripheral, multifocal, and diffuse patterns according to the classification by Akira et al. [15]. Parenchymal opacification predominantly appeared in the subpleural peripheral zone in the peripheral pattern. In contrast, multiple parenchymal opacifications were apparent in both central and peripheral regions in the multifocal pattern. Diffuse patterns revealed generalized pulmonary involvement, with or without heterogeneity (Fig. 2). Further CT findings included consolidation, linear opacities, reversed halo sign, and crazy-paving sign [5,6,[8][9][10]13]. The definitions of these CT findings were based on the uniform terms for thoracic imaging by the Fleishner society [16]. We also assessed the number of affected lobes.
All CT assessments were reviewed by two radiologists (DY and KI), with 5 and 8 years of experience, respectively, who were blinded to clinical patient data. These two radiologists were involved with the original CT  For the study, CT assessment was performed without any clinical information, at least 3 months after the initial clinical assessment. The CT images were randomized for assessment to prevent recall bias. Discrepancies in classification of CT findings between the two radiologists were resolved by consensus.

Quantitative analysis
The dataset was anonymized and exported to a dedicated segmentation suite for medical image computing (GE Healthcare, USA), equipped with a semi-automated segmentation algorithm (Thoracic Volume Computer-Assisted Reading software). The software performed a first-pass automated segmentation. Lung volumes were then manually perfected using three-dimensional tools, such as spherical brushes or erasers. A complete segmentation included both lungs with interstitial structures, segmentary vessels, and bronchi. The major pulmonary arteries and bronchi, all mediastinal structures, eventual pleural effusion, and lung masses (e.g., tumors, fungal disease) were excluded. We extracted the lung volumes and calculated the percentage of the total volume affected, according to different Hounsfield unit (HU) intervals, into non-aerated (NNL) (% NNL, density between 100 and − 100 HU), poorly aerated (PAL) (% PAL, − 101 to − 500 HU), normally aerated (NAL) (% NAL, − 501 to − 900 HU), and hyperinflated (− 901 to − 1000 HU) [17]. The additional "compromised lung" (% CL) volume was calculated as the sum of % PAL and % NNL (− 500 to 100 HU) (Fig. 2). The authors in charge of the segmentation (T.S. and D.U.) were unaware of the laboratory and clinical parameters or hospitalization outcomes of patients. Any discrepancies were resolved by a consensus between the two radiologists. The principal investigator reviewed and confirmed all segmentations before data entry. We recorded the time required to complete each analysis.

Data sources
We used chart review to obtain clinical information, including physical examination findings and laboratory data at the time of admission; and the clinical course after hospitalization. Moreover, we collected data for the date of onset, the date of CT imaging, the time between onset and CT scan, the date of hospitalization, the duration of In the multifocal pattern, parenchymal opacification is apparent in the central and peripheral regions. c The diffuse pattern reveals generalized pulmonary involvement, with regional inhomogeneity. d-f Semi-automated segmentation using Thoracic VCAR software (GE Healthcare, USA). Blue areas represent normal lung parenchyma in the -501, -900 HU interval; light blue areas represent hyperinflated lung in the -901, -1000 HU; yellow areas represent poorly aerated lung in the -500, -100 HU interval; and red areas represent non-aerated lung in the 100, -100 HU interval. COVID-19, coronavirus disease 2019; HU, Hounsfield unit; VCAR, Volume Computer-Assisted Reading hospitalization, the presence of tracheal intubation, the duration of tracheal intubation, history of intensive care unit (ICU) admission, and death. The following clinical information was collected immediately after hospitalization: respiratory rate, oxygen saturation (SpO2), partial pressure of oxygen (PaO2), pulse rate, systolic blood pressure, diastolic blood pressure, body temperature, white blood cell count, lymphocyte count, neutrophil count, platelet count, C-reactive protein (CRP), lactate dehydrogenase (LDH), Krebs von den Lungen-6 antigen (KL-6), D-dimer, and if the patient was on steroids, heparin, or favipiravir. The steroids administered were dexamethasone 6.6 mg/day or methylprednisolone 1 mg/kg/day intravenously. Heparin was administered by continuous infusion of heparin Na (10,000 units/day) or subcutaneous injection of heparin Ca (5,000 units) twice daily. Favipiravir was administered at a dose of 1800 mg twice daily on day 1, and 800 mg twice daily on day 2 and thereafter.

Statistical analyses
To evaluate the reproducibility of the classification, we calculated the interobserver reliability for each finding using the Cohen kappa value. The following ratings were used to interpret the kappa value: poor, < 0.40; moderate, 0.40-0.59; good, 0.60-0.80; and excellent, > 0.80. The association between the three CT patterns and compromised lung (% CL) was evaluated using the Kruskal-Wallis test. The association between the three CT patterns and ICU admission, intubation management, and death was assessed using the chi-squared test. The Kruskal-Wallis test was used to compare the duration of hospitalization, duration of intubation, and time from the onset to CT scan between the three CT patterns. Univariate and multivariate logistic regression analyses were performed to evaluate the association between each CT finding and intubation. The correlation between the CT patterns and clinical information was calculated. Parametric Fisher analysis of variance (ANOVA) was used for between-pattern comparison of clinical variables with a normal distribution (pulse rate, systolic blood pressure, diastolic blood pressure, body temperature, neutrophil count, platelet count, CRP, and D-dimer); Kruskal-Wallis test was used for variables with a non-normal distribution (respiratory rate, SpO2, PaO2, white blood cell count, lymphocyte count, neutrophil count, LDH, KL-6, Acute Physiology and Chronic Health Evaluation-II score, and sequential organ failure assessment [SOFA] score). Differences in the use of steroids, heparin, and favipiravir between the three patterns were evaluated using the chi-squared test. Subsequently, we conducted the Bonferroni test and Mann-Whitney U test as a post-hoc test. Univariate and multivariate analyses were performed on the CT findings. All statistical analyses were performed using Stata 16.1 (StataCorp LP, TX, United States). A p value < 0.05 was considered statistically significant.

Results
Pretreatment CT images for each patient were classified into one of the following three patterns: diffuse, multifocal, and peripheral. There was good interobserver reproducibility in the classification of images (κ = 0.74).
There were differences in the rate of ICU admission, tracheal intubation, and death between the three patterns (p = 0.07, p < 0.01, and p = 0.06, respectively), with the difference in the rate of tracheal intubation alone being significant. Further, each pattern was associated with a different duration of hospitalization, duration of tracheal intubation, and the time from the onset to CT scan (p = 0.09, p < 0.01, and p = 0.80, respectively); the difference in the duration of tracheal intubation alone was significant (Fig. 3). These variables also showed significant differences between the peripheral and diffuse patterns and between the multifocal and diffuse patterns (p < 0.01 and p < 0.01, respectively).
There was a significant difference between the three CT patterns and volume of the compromised lung (% CL) (p < 0.01; Fig. 4). On post-hoc analysis, there was a significant difference in % CL between the peripheral and diffuse patterns and between the multifocal and diffuse patterns (p < 0.01, respectively). Density plot width indicates the frequency. There was a significant difference between the three CT patterns and duration of intubation (Kruskal-Wallis test, p < 0.0001). On the Mann-Whitney U test, there were significant differences between the peripheral and diffuse patterns and between the multifocal and diffuse patterns (p = 0.003 and p = 0.001, respectively). CT, computed tomography There were significant differences in temperature, CRP, and D-dimer, lymphocyte count, neutrophil count, LDH, KL-6, and administration of steroid and favipiravir between the three CT patterns (p < 0.05 each, Table 1).

Duration of intubation
We performed separate statistical analyses for patients with and without tracheal intubation (Table 2), with a comparison of their clinical parameters. Respiratory rate, white blood cell count, lymphocyte count, neutrophil count, CRP, LDH, KL-6, SpO2, D-dimer, steroid, favipiravir, and SOFA were significantly correlated with intubation (p < 0.05; Table 3 Table 4). The three CT patterns predicted the tracheal intubation, with an area under the receiver operating characteristic (ROC) curve of 0.77 (Fig. 5).

Discussion
The findings of this study show that patients with a diffuse pattern on pretreatment lung CT had a higher and prolonged requirement for intubation. Furthermore, we tested quantitative CT analysis as an outcome predictor for COVID-19 using a semi-automated method. In univariate logistic regression analysis, three CT patterns and quantitative CT analysis were significant. However, only the three CT patterns were retained as an independent predictor of tracheal intubation in the multivariate logistic regression analysis, with patients with a diffuse CT pattern being at the highest risk and requiring prolonged duration of intubation. COVID-19 pneumonia has an extremely variable prognosis [17][18][19][20][21]. While 80% of patients are either asymptomatic or have mild symptoms, 20% develop severe or profound disease and eventually die [22][23][24][25]. CT imaging of the lungs plays an important role in the care of patients infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), with the prognostic value of CT evaluated in several studies [9][10][11][12][13][14][18][19][20][21]. However, most of these studies are from China, Europe, and the United States, with no information available for patient cohorts in Japan [22][23][24][25][26]. In addition, several researchers have used image analysis software to quantify CT findings, making these impractical in daily clinical practice [25][26][27]. Practicality in clinical practice was our motivation to develop and evaluate the prognostic value of our visually-based assessment of CT lung findings (peripheral, multifocal, and diffuse) in a hospital patient cohort in Tokyo, Japan, a city with a particularly high SARS-CoV-2 prevalence. COVID-19 pneumonia shares a similar pathogenesis to acute exacerbation of interstitial pneumonia. Volume loss because of alveolar collapse is the primary cause of traction bronchiectasis in COVID-19 pneumonia. However, the extent of involved alveoli and mucosa may expand with disease progression. In addition, inflammation-related damage to the bronchial walls may lead to fibrosis, bronchiectasis, and bronchial wall thickening. Thus, COVID-19 is associated with acute respiratory distress syndrome and may produce CT findings similar to acute exacerbation of interstitial pneumonia [17]. Akira et al. reported that the prognosis of acute exacerbation in interstitial pneumonia could be predicted by classifying findings on lung CT images into peripheral, multifocal, and diffuse patterns [15]. In our study, we applied the same classification to patients with COVID-19 and evaluated if these three patterns were predictive of patients' clinical course. By simply classifying the CT images of a COVID-19 patient into three patterns, we can predict the requirement and duration of intubation for that patient, allowing us to quickly and easily allocate optimal medical resources to the patient. In addition, this study revealed that the requirement for intubation can be predicted through our method as efficiently as, or better than, through quantitative analysis of CT images, indicating the possibility that the prognosis of COVID-19 patients can be predicted to some extent by visual judgment of CT images even in institutions that cannot introduce software for CT analysis.
We also compared the clinical variables in three different CT patterns. As a result, the following clinical   variables were significantly different between the diffuse pattern and peripheral and multifocal groups: body temperature, lymphocyte count, neutrophil count, c-reactive protein, lactate dehydrogenase, Krebs von den Lungen-6 antigen, D-dimer, and steroid and favipiravir administration. These results suggest that the more extensive the abnormal findings in the lungs, the more severe the systemic over-inflammatory response in COVID-19. The laboratory features presented by this study could be attributed to respiratory failure, septic shock, and/or  The identification of prognostic factors at an early stage of the disease could help guide clinicians in providing an optimal treatment path based on patient-specific characteristics, as well as predict more precisely where medical resources are most required. The reason for the lack of an association between mortality and the length of hospital stay and the three CT patterns is unclear. Nonetheless, the study population presumably included some of the earliest patients following the pandemic, and the treatment methods were inconsistent, thereby resulting in a lack of correlation between mortality and the length of hospital stay. Another reason is that the aggressive use of steroids and favipiravir in patients with severe COVID-19 may have prevented a significant difference in mortality. We intend to collect more data in the future to clarify the association between CT findings and mortality. The limitations of our study should be acknowledged in the interpretation of our results. First, this was a single-center retrospective study with a small sample size. Second, only Japanese patients from the city of Tokyo were included. As such, the risk of bias related to viral factors, such as host factors and genomic variation, cannot be discounted. We do note the benefit of a single center for ensuring a uniform assessment of images. Third, cases in the early stages of the COVID-19 pandemic in Japan were included in this study, and treatment methods were not consistent. Fourth, although we instructed all patients to breathe in as much as possible when CT was performed, there was a possibility that patients with poor respiratory status did not have sufficient inhalation volume. This may have affected the quantitative analysis of CT. Fifth, our hospital is one of the facilities that preferentially accepts critically ill patients with COVID-19 in our area. Therefore, we accept the patients with severe disease at our hospital and all the patients who underwent CT scan were subject to oxygen inhalation on admission. This may cause the selection bias on this study. Sixth, we did not follow the time course of the changes in CT findings in each case. Lastly, the inclusion criteria were limited to patients with pulmonary lesions on the initial CT scan; patients with no abnormalities on the initial CT scan were not included.

Conclusions
Our simple visual assessment of CT images can predict the severity of illness, a decrease in respiratory function, and the need for supplemental respiratory ventilation among patients with COVID-19.