Skip to main content

Use data augmentation for a deep learning classification model with chest X-ray clinical imaging featuring coal workers' pneumoconiosis



This paper aims to develop a successful deep learning model with data augmentation technique to discover the clinical uniqueness of chest X-ray imaging features of coal workers' pneumoconiosis (CWP).

Patients and methods

We enrolled 149 CWP patients and 68 dust-exposure workers for a prospective cohort observational study between August 2021 and December 2021 at First Hospital of Shanxi Medical University. Two hundred seventeen chest X-ray images were collected for this study, obtaining reliable diagnostic results through the radiologists' team, and confirming clinical imaging features. We segmented regions of interest with diagnosis reports, then classified them into three categories. To identify these clinical features, we developed a deep learning model (ShuffleNet V2-ECA Net) with data augmentation through performances of different deep learning models by assessment with Receiver Operation Characteristics (ROC) curve and area under the curve (AUC), accuracy (ACC), and Loss curves.


We selected the ShuffleNet V2-ECA Net as the optimal model. The average AUC of this model was 0.98, and all classifications of clinical imaging features had an AUC above 0.95.


We performed a study on a small dataset to classify the chest X-ray clinical imaging features of pneumoconiosis using a deep learning technique. A deep learning model of ShuffleNet V2 and ECA-Net was successfully constructed using data augmentation, which achieved an average accuracy of 98%. This method uncovered the uniqueness of the chest X-ray imaging features of CWP, thus supplying additional reference material for clinical application.

Peer Review reports


Coal workers' pneumoconiosis (CWP) is an occupational lung disease whose specific pathological changes include diffuse interstitial lung fibrosis [1] due to prolonged exposure to excessive quantities of respiratory coal dust. This condition can lead to irreversible and potentially fatal lung diseases, including chronic obstructive pulmonary disease, tuberculosis, chronic bronchitis, emphysema, and other lung diseases [2]. Although the prevalence of CWP has substantially decreased over the past few decades, according to recent research [2,3,4], the incidence in the eastern region of the United States and the state of Queensland in Australia has shown a worryingly increasing trend. In China, the prevalence of CWP appears high, with one report indicating that it constituted over 6% of all Chinese occupational diseases in the 2010s; specifically, over 13,000 miners were diagnosed with CWP in 2013 and 2014 [5]. Moreover, according to the China National Report of occupational diseases, CWP was one of the most prevalent types of pneumoconiosis among all age groups, and most pneumoconiosis cases were diagnosed among males [6].

Although pneumoconiosis is prevalent and there are no effective therapies for the treatment of CWP, an early diagnosis can broadly prevent the development of complications and aid in the development of early treatments for any that do arise. Currently, X-ray is the essential tool for suggesting whether there are any suspicious findings of pneumoconiosis via the identification of subtle graphic patterns and features described in the International Labour Organisation (ILO) guidelines. The typical radiological features of CWP include small nodular interstitial opacities in the upper zones; however, not all these features are exclusive to CWP [7]. A long-term evaluation of CWP in America found that almost 40% of coal miners with radiographical interstitial changes had predominantly irregular opacities [8]; on X-ray images, these irregular opacities may also be features of mixed-dust pneumoconiosis (MDP), which, pathologically, manifests as a pneumoconiosis showing dust macules or mixed-dust fibrotic nodules, with or without silicotic nodules, in an individual with a history of exposure to mixed dust [9]. The radiological findings of CWP with secondary lung disease might be similar to those seen in idiopathic pulmonary fibrosis [7, 10]. In addition, the respiratory symptoms of CWP are nonspecific and mostly overlap with other coal dust-related conditions, such as chronic bronchitis, chronic obstructive pulmonary disease (COPD), and emphysema [11]. Many CWP and dust-exposed workers often choose general hospitals for treatments when they begin coughing and presenting with dyspnoea. Generally, the imaging workup for chest diseases starts with a chest X-ray (CXR), but it has a limited role in diagnosing pulmonary complications of pneumoconiosis because of overlapping pneumoconiotic infiltration [12]. Patients with CWP experience many complications, such as chronic interstitial pneumonitis and pneumothorax; hence, knowledge of the CWP imaging features on chest radiographs are important for improving the rates of early diagnosis and cure, especially considering the difficulties in recognizing and treating the disease.

Prior studies have reported encouraging results in medical image analysis with artificial intelligence (AI) [13,14,15,16,17,18]. Although they are unlikely to replace radiologists for the foreseeable future, AI algorithms have achieved performance comparable to that of radiology experts in interpreting CXRs [19]. Deep learning (DL), a subdiscipline of AI, has emerged as a new solution for many medical image analysis problems, with remarkable success in classifying pneumoconiosis grade and exploring the application of AI in detecting pneumoconiosis [20, 21]. The merits of DL lie in its ability to learn complex imaging features or patterns inconspicuously without purposefully identifying and extracting them, as tens of millions of features may be involved and analysed to obtain high-level features [18].

Many kinds of deep learning models, also known as deep convolutional structures, such as EfficentNet [22], VGGNet [23], ResNet [24], ZDNet [25], DenseNet [26], and GoogleNet [27], have been proposed that can provide peer-to-peer solutions for image feature extraction and are superior to traditional criteria in almost all image recognition tasks [13]. In the field of pneumoconiosis screening and staging, the core design of these experiments mainly depended on massive datasets. Challenged by unadequate analysis of limited data, to address this problem, it is necessary to apply data augmentation to increase the additional data in our deep learning model, which could help DL model more generalisable and ultimately improve performance on the model training.

In this paper, we proposed a deep learning classification model based on the data augmentation technique that could steadily improve the ability of radiologists to interpret CWP chest radiographs by screening and discovering the uniqueness of imaging features of CWP, thus potentially improving the comprehension of the clinical radiologic manifestations of pneumoconiosis.

Materials and methods

Patients and data collection

Patients with CWP or who were exposed to dust who voluntarily participated in the Coal Mining Workers' Pneumoconiosis and dust exposure Cohort Study (CONDUCT) between 28 August 2021 and 12 December 2021 were enrolled. CONDUCT was a CWP-based prospective cohort study conducted to understand the epidemiological characteristics of CWP, to explore the related risk factors for pneumoconiosis in addition to dust and to encourage research on CWP in terms of early diagnosis, novel mechanisms and drug discovery.

In the study reported here, we collected workers with confirmed CWP and dust-exposed workers from coal mines around Taiyuan City complicated with cough, dyspnoea, or other symptoms. Most of the patients were initially treated at a local tertiary-A integrated hospital for CXR assessment. Among the coal mining workers with the chronic respiratory symptoms listed above, we identified those who participated in CONDUCT after obtaining permission from the Taiyuan City Center for Disease Control and Prevention and provided them with yearly clinical systematic health checks. We classified the imaging features listed in the corresponding examination reports and extracted features from the target regions of the pneumoconiosis CXR images using the enhanced ShuffleNet V2 combined with ECA-Net based on deep learning. This model was used for target region classification training to perform clinical imaging feature classification for CWP/dust-exposed workers and to explore the application of deep learning in imaging feature detection for CWP.

All CXRs analysed in our study were collected from the first of three yearly follow-up surveys from CONDUCT in 2021. All patients included in this study signed informed consent forms, and the Ethics Committee of the First Hospital of Shanxi Medical University approved the study (2020 K-K104). The database established by the cohort study was registered in the Chinese Clinical Trial Registry (27/08/2021, ChiCTR2100050379). It included 217 anonymized CXRs, from male patients aged 35–80 years with cough, dyspnoea, or other symptoms; 149 patients were placed in the CWP group (whose stages are defined in Table 1), and the remaining 68 patients were placed in the standard group (dust-exposed workers). For patients with multiple previous hospitalizations or visits, the data during this study were adopted. The duration of exposure to dust (for patients with a clear dust-exposure history) was more than 15 years, with an average of 28.95 ± 14 years. We present the patient baseline characteristics of the different groups in Table 2.

Table 1 Definition of CWP stages (according to GBZ70-2015)
Table 2 Baseline characteristics

Chest X-ray acquisition

We obtained chest radiographs using a mobile X-ray system (Carestream, DRX-Revolution VX3733-SYS, America), with a tube voltage of 120–150 kVp, exposure time of 100 ms, and optimum source-to-image distance (SID) of 180 cm. After calibration, the industry standard regulated by Digital Imaging and Communications in Medicine (DICOM) was required for the CXR images.

Classification and segmentation

To build the CWP CXR imaging feature classification system, we invited three radiologists with 5–10 years of experience in interpreting CXR to create imaging diagnosis reports for all CXRs, which were subsequently independently analysed. Then, two senior radiologists with over 10 years of experience reviewed these diagnostic results and made the final diagnostic decisions. During this period, the senior radiologists were not informed of any details of the CWP study.

The clinical CXR imaging features were confirmed based on the diagnostic reports by our experienced radiology team and included the following: (A) pulmonary nodules; (B) pulmonary interstitial changes; (C) emphysema; (D) pleural plaques; and (E) mediastinal masses. Previous research on pneumoconiosis classification proposed handcrafted feature extraction from each region of interest (ROI) [28, 29]. To identify the target lung region in each CXR, we segmented the ROIs (shown in Fig. 1) from the lung field via screenshot while referencing the diagnosis reports. Moreover, these images were saved as JPEG images of appropriate size (at least 30 × 30 pixels). Of the 217 patients recruited in the study, one ROI was segmented from 171 patients each (171 ROIs) two ROIs were segmented from 22 patients each (44 ROIs), and three ROIs were segmented from the remaining 24 patients each (72 ROIs). The ROIs were then assigned the following data labels: pulmonary nodules (n = 376), pulmonary interstitial changes (n = 116), and emphysema (n = 28); pleural plaques and mediastinal masses were excluded, as they were severely underrepresented on the CXRs (n < 10). Finally, all imaging features were classified into three categories: A (pulmonary nodules), B (pulmonary interstitial changes), and C (emphysema) (shown in Fig. 1). Table 3 shows the detailed distribution of the clinical CXR imaging features.

Fig. 1
figure 1

Original CXR a with identifying target lung region. We segmented b regions of interest (ROIs) classified into three types

Table 3 Distribution of clinical CXR imaging features

Deep learning CWP image data augmentation model

Due to the limitations of small datasets, Devnath et al. [30] first proposed transfer learning with a convolutional neural network (CNN) for the detection of pneumoconiosis disease on CXR. Subsequently, many studies [31, 32] confirmed CWP detection on CXR using data augmentation, which improved the quality of the deep CNN, increased the amount of training data, and outperformed other statistical and traditional machine learning approaches as well as radiologists. Considering the particular characteristics of medical images, we performed some additional information or data transformation on the original images to selectively highlight specific regions or suppress unimportant regions, thus enhancing the image data, making them more conducive to model training and avoiding overfitting the results of the model.

The flow chart of the method used in this study is shown in Fig. 2, and the confusion matrix of the accuracy of multiple models in Fig. 3. In this study, through comprehensive comparisons, the original ShuffleNet model was shown to possess higher classification accuracy and recall than the other models; thus, we chose it for modification to further improve its performance.

Fig. 2
figure 2

Flowsheet clarifying the procedure of classifying CXR clinical features among CWP

Fig. 3
figure 3

Comparison of accuracy in CWP classification with different algorithms


Practical integration of data augmentation is an important discussion point regarding the development of future deep learning workflows [33, 34]. As our dataset contained 520 ROIs from a total of 217 images, it was essential to provide an adequate analysis of the limited data. ShuffleNet is an exceedingly computationally efficient CNN model proposed by Zhang X in 2018, who designed this new model to perform two new operations, pointwise group convolution and channel shuffling, to significantly reduce the computational cost while maintaining accuracy [35]. Compared with ImageNet classification and MSCOCO object detection, their study demonstrated the superior performance of ShuffleNet over other structures and highlighted the advantages of packet convolution in operational efficiency, especially for small datasets.

Efficient channel attention (ECA)-Net

Some researchers have claimed that a multiscale augmentation strategy is crucial for data expansion and thus increasing the accuracy of classification modelss [20]. The channel attention mechanism has been proven to improve the accuracy of CNN models; however, the performance improvement achieved by incorporating such sophisticated attention methods unavoidably increases the model complexity. In 2020, Wang et al. [36] proposed an efficient channel attention (ECA) module that only requires a handful of parameters while providing apparent performance improvements. The ECA-Net mechanism is a “plug-and-play”, lightweight attention module that generates channel attention by using fast 1D convolution to ensure model efficiency and accuracy. Considering the superiorities of ECA-Attention, it can lead to significant model accuracy improvements despite the use of finite sample data, thus increasing the ability to classify imaging features.

Image data augmentation

Current studies [37, 38] suggest that image data augmentation could be divided into two major categories: basic image manipulations and augmentation methods based on deformable techniques and DL-based approaches.This study applied augmentation technique based on basic image manipulation. To expand the ROI dataset and increase the robustness of the model, all ROIs were segmented manually, despite the potential lack of objectivity; it was necessary to perform image data augmentation in case of overfitting during the training process, which we summarize as follows. First, we performed k-fold cross-validation to split the original dataset into three groups: the training group accounted for 70% of the data, the validation group accounted for 20%, and the test group accounted for 10%. Second, ROIs were randomly selected from different groups in the original dataset and processed with all kinds of image dataset augmentation techniques (shown in Table 4). These operations yielded a total of over 12,000 training images were acquired.

Table 4 Detailed classification of image dataset augmentation

Implementation details

All ROIs were divided into ten groups (ten-fold cross-validation); the individual groups were separately adopted as test data, and the remaining nine groups were used for training in turn; thus, each round trained 16 images. We set the initial learning rate to 0.0001, the number of training epochs to 30, and the classification output to three types. Through the PyTorch tool, the above steps could be performed adequately with one RTX 2080Ti GPU.

ShuffleNet V2 combined with ECA-Net (ShuffleNet-Attention)

The ShuffleNet-v2 network is an optimized version the ShuffleNet structure based on criteria such as optimal MAC, reduced network fragmentation, and reduced elementwise operations [39]. Several preliminary experimental results showed that using the original ShuffleNet for classification training always confused pulmonary interstitial changes with emphysema. To avoid causing feature misidentification, we chose ShuffleNet V2 as the model backbone and combined it with the ECA-Net mechanism after each convolution layer, creating ShuffleNet-Attention (Additional fles 1, 2, 3, 4). Through this combination of two models, the convolution layers of each block can be simplified, while more semantic information is conveyed to the feature layers. This reduced the computational complexity by more than 40% with respected the original network, markedly increasing the classification accuracy (from 92 to 94%) more rapidly.

Evaluation metrics

As performance metrics for the model, accuracy, loss function, precision, recall, F1-score, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were evaluated. The evaluation metrics involved in this study are calculated as follows:

$${\text{Accuracy}} = \frac{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{true}}\,{\text{classsification}}}}{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{images}}\,{\text{classsification}}}}$$

Accuracy is defined here as the number of correctly classification of CWP clinical features divided by the total number of classification of CWP clinical features evaluated. In deep learning classifcation tasks, the loss function is commonly applied to optimize the parameters of the model, which can improve the classifcation accuracy and promote the weights of categories with smaller samples effectively.

$${\text{Recall}} = \frac{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{false}}\,{\text{positive}}\,{\text{classsifications}}\,{\text{(FP)}}}}{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{false}}\,{\text{positive}}\,{\text{classsifications}}\,{\text{(FP)}} + {\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{true}}\,{\text{negative}}\,{\text{classsifications}}\,{\text{(TN)}}}}$$

Recall is defined here as the proportion of correctly classification of CWP clinical features.

$${\text{Precision}} = \frac{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{true}}\,{\text{positive}}\,{\text{classsifications}}\,{\text{(TP)}}}}{{{\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{true}}\,{\text{positive}}\,{\text{classsifications}}\,{\text{(TP)}} + {\text{Total}}\,{\text{of}}\,{\text{all}}\,{\text{false}}\,{\text{positive}}\,{\text{classsifications}}\,{\text{(FP)}}}}$$

The precision measures the probability of making correct positive classification of CWP clinical features.

$${\text{F1 - score}} = \frac{{{2} \times {\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}$$

A high F1-score reflects good classification performance, as it is the harmonic mean of precision and recall.The F1 score reaches its highest value at 1 and his lowest value at 0.

In order to help correctly evaluating the DL-model classifier integral performance, we evaluated the classified accuracy of CWP clinical features according to the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC).The ROC curve, which has been widely used in the field of medicine to evaluate the performance of deep learning diagnostic methods, is the plot of the true positive rate against the false negative rate at different threshold values, and the AUC is the area under the ROC curve and represents the quality of the DL model; a larger value indicates a better classification effect.


We summarized the model evaluation results using the test dataset (Table 5), the accuracy in CWP imaging feature classification with different models (Fig. 3), the accuracy versus the training epochs (Fig. 4), and the model loss versus the epochs (Fig. 5). ShuffleNet V2 combined with ECA-Attention (ShuffleNet-Att) reached the minimum value between epochs five and ten, slightly lower than the other models (Fig. 4). With sufficient training, the accuracy increased significantly after epoch 15, reaching more than 95% after epoch 25, while the loss curve (Fig. 5) reached its lowest point after epoch 25; thus, ShuffleNet-Attention ultimately achieved the best performance among these models.

Table 5 Performance of ShuffleNet v2, ResNet 50, GoogleNet, DenseNet 121 and ShuffleNet-Attention on the test set
Fig. 4
figure 4

The accuracy in classification with different models with epochs

Fig. 5
figure 5

The losses in classification with different models with epochs

Therefore, we selected ShuffleNet-Attention as the optimal model following this comparison and analysis. According to ROC curve analysis, the average AUC was 0.98; more specifically, the model achieved an AUC of 0.97 in classifying pulmonary nodules (Class A), 0.96 in classifying pulmonary interstitial changes (Class B), and 1.0 in classifying emphysema (Class C), as demonstrated in Fig. 6.

Fig. 6
figure 6

The ROC curve in classification with different models with epochs. Class A: pulmonary nodules, Class B: pulmonary interstitial changes, Class C: emphysema


In the present study, we built a reliable model for classifying pneumoconiosis clinical imaging features by utilizing potential deep learning algorithms and well-annotated chest radiographs. To identify the clinical CXR imaging features of pneumoconiosis patients and dust-exposed workers more quickly and effectively, a computer-aided classification system was constructed by combining ShuffleNet V2 with ECA-Net.

Several previous studies have shown satisfactory performance for radiologists in using automatic DL-based models for pneumoconiosis screening and staging with CXR images. Among these studies [18, 21], the good performances were primarily attributed to the size of the datasets, which contained approximately 2000 chest radiographs from multiple centres or devices. When comparing the performance with evaluation metrics for different DL algorithms in interpreting pneumoconiosis, among a range of options available, the studies selected the best one to learn the image features to obtain an accurate classification of the pneumoconiosis grade. It is still necessary to obtain larger datasets whose lesion appearances of interest vary significantly for deep learning, implying that massive datasets play a vital role in these studies. In another study by Zhang et al.[20], a total of 405 DR images were analysed for screening and staging pneumoconiosis. They performed extensive image augmentation with a limited amount of training data, which vastly reduced the influence of sampling condition, image contrast, and lung size and improved the accuracy of the model. However, unlike the above studies, our study focused specifically CWP and aimed to analyse its imaging features from the perspective of secondary prevention to further improve the understanding of the disease in a clinical environment. In addition, CWP is the most common pneumoconiosis among dust-exposed workers; however, it is an entirely different disease from other pneumoconioses, such as silicosis. Even though their imaging features seem similar, there are considerable differences between CWP and other pneumoconiosis in terms of pathological characteristics, which makes specifically identifying CWP cases a challenge. We consider it essential to analyse and explore essential variables based on clinical knowledge of CWP.

This study focuses on classifying pneumoconiosis clinical imaging features for the following reasons: (1) Instead of screening and staging pneumoconiosis by only analysing lung regions on chest radiographs, it may be more necessary and useful to focus on analysing the characteristics of CWP with clinical eyes and classify pneumoconiosis clinical imaging features, as this may be more beneficial to the differential diagnosis of CWP and thus the early evaluation of the condition and the administration of early treatment. (2) Due to the complexity of the multiple classifications in pneumoconiosis and the particularity of the diagnosis process, workers exposed to coal minerals have an increased risk of CWP. The prediction models have high accuracy and were augmented with improvements in data extraction by deep learning techniques that include more clinically significant variables.

This study focuses on classifying pneumoconiosis clinical imaging features for the following reasons: (1) Instead of screening and staging pneumoconiosis by only analysing lung regions on chest radiographs, it may be more necessary and useful to focus on analysing the characteristics of CWP with clinical eyes and classify pneumoconiosis clinical imaging features, as this may be more beneficial to the differential diagnosis of CWP and thus the early evaluation of the condition and the administration of early treatment. (2) Due to the complexity of the multiple classifications in pneumoconiosis and the particularity of the diagnosis process, workers exposed to coal minerals have an increased risk of CWP. The prediction models have high accuracy and were augmented with improvements in data extraction by deep learning techniques that include more clinically significant variables.

Our study has several limitations. First, this sample size might be too small to obtained more detailed features for deep learning. We attempted to define the clinical difference between dust-exposed workers and patients with different CWP stages by classifying various imaging features. However, the sample size was 217 patients, which may have been too low. While this study had sufficient power to discern classifying outcomes correctly among these groups, more sample sizes would be needed to verify the accuracy of our study in the future. Second, the noninclusion of specific imaging features in the DL model, such as the position of pleural plaques and mediastinal masses, was not analysed, as the algorithm was not specifically trained for those features due to their low incidence in our dataset. A pooled analysis using the data from these features may be more beneficial in expanding the knowledge for different aspects of CWP clinically.Third, compared with the pulmonary nodules and pulmonary interstitial changes dataset, it is possible that the emphysema dataset was not sufficiently large to demonstrate significant feature differences, which might have contributed to the high accuracy for this classification. Shorten C et al. suggested that data augmentation cannot overcome all biases emerging in a small dataset; however, it can prevent overfitting by modifying limited datasets to better represent the characteristics of big data [33], which implies the need to obtain more imaging data for verification. After all, there are many limitations in identifying emphysema imaging features with chest radiographs. Computed tomography (CT) could provide more specific emphysema imaging features and should be analysed in future studies.

In addition, the clinical diagnosis of pneumoconiosis in China may be challenging. Hence, we believe that it remains crucial to better manage affected patients and to analyse valid clinical data. There may still be a lack of well-understood relationships between pneumoconiosis and imaging features on chest radiographs, and further research is needed.


In this study, we classified CXR clinical imaging features of CWP using a deep learning technique on a small dataset, and a data augmentation model was successfully constructed by combining ShuffleNet V2 and ECA-Net. ShuffleNet-Attention demonstrated the best performance among the different models investigated and achieved an average accuracy of 98% for the imaging dataset. While this proposed method is incapable of screening and staging pneumoconiosis, the successful data augmentation model could assist radiologists in discovering the uniqueness of imaging features of CWP by using chest radiographs, thus supplying more references for clinical application.

Our study was part of a multicentre prospective cohort study; future work will be devoted to testing and verifying the application of the DL-based model in a clinical environment and obtaining more imaging data to support further research.

Availability of data and materials

The Chest X-ray imaging datasets used and/or analyzed during the current study are not publicly available due to the policies of the fund funded institutions, the research data may be uniformly disclosed after whole research completed but are available from the corresponding author on reasonable request.The deep learning model online direct link of data is available at


  1. Wang T, Sun W, Wu H, Cheng Y, Li Y, Meng F, Ni C. Respiratory traits and coal workers’ pneumoconiosis: Mendelian randomisation and association analysis. Occup Environ Med. 2021;78(2):137–41.

    Article  Google Scholar 

  2. Xu G, Chen Y, Eksteen J, Xu J. Surfactant-aided coal dust suppression: A review of evaluation methods and influencing factors. Sci Total Environ. 2018;639:1060–76.

    Article  CAS  Google Scholar 

  3. Blanc PD, Seaton A. Pneumoconiosis Redux. Coal workers’ pneumoconiosis and silicosis are still a problem. Am J Respir Crit Care Med. 2016;193(6):603–5.

    Article  CAS  Google Scholar 

  4. Qi XM, Luo Y, Song MY, Liu Y, Shu T, Liu Y, Pang JL, Wang J, Wang C. Pneumoconiosis: current status and future prospects. Chin Med J (Engl). 2021;134(8):898–907.

    Article  Google Scholar 

  5. Han L, Gao Q, Yang J, Wu Q, Zhu B, Zhang H, Ding B, Ni C. Survival analysis of coal workers’ pneumoconiosis (CWP) patients in a state-owned mine in the East of China from 1963 to 2014. Int J Environ Res Public Health. 2017;14(5):489.

    Article  Google Scholar 

  6. Zhao JQ, Li JG, Zhao CX. Prevalence of pneumoconiosis among young adults aged 24–44 years in a heavily industrialized province of China. J Occup Health. 2019;61(1):73–81.

    Article  Google Scholar 

  7. Perret JL, Plush B, Lachapelle P, Hinks TS, Walter C, Clarke P, Irving L, Brady P, Dharmage SC, Stewart A. Coal mine dust lung disease in the modern era. Respirology. 2017;22(4):662–70.

    Article  Google Scholar 

  8. Laney AS, Petsonk EL. Small pneumoconiotic opacities on U.S. coal worker surveillance chest radiographs are not predominantly in the upper lung zones. Am J Ind Med. 2012;55(9):793–8.

    Article  Google Scholar 

  9. Honma K, Abraham JL, Chiyotani K, De Vuyst P, Dumortier P, Gibbs AR, Green FH, Hosoda Y, Iwai K, Williams WJ, et al. Proposed criteria for mixed-dust pneumoconiosis: definition, descriptions, and guidelines for pathologic diagnosis and clinical correlation. Hum Pathol. 2004;35(12):1515–23.

    Article  CAS  Google Scholar 

  10. Brichet A, Tonnel AB, Brambilla E, Devouassoux G, Remy-Jardin M, Copin MC, Wallaert B. Groupe d’Etude en Pathologie Interstitielle de la Societe de Pathologie Thoracique du N: Chronic interstitial pneumonia with honeycombing in coal workers. Sarcoidosis Vasc Diffuse Lung Dis. 2002;19(3):211–9.

    PubMed  Google Scholar 

  11. Coggon D, Newman Taylor A. Coal mining and chronic obstructive pulmonary disease: a review of the evidence. Thorax. 1998;53(5):398–407.

    Article  CAS  Google Scholar 

  12. Jun JS, Jung JI, Kim HR, Ahn MI, Han DH, Ko JM, Park SH, Lee HG, Arakawa H, Koo JW. Complications of pneumoconiosis: radiologic overview. Eur J Radiol. 2013;82(10):1819–30.

    Article  Google Scholar 

  13. Devnath L, Luo S, Summons P, Wang D. Automated detection of pneumoconiosis with multilevel deep features learned from chest X-Ray radiographs. Comput Biol Med. 2021;129: 104125.

    Article  Google Scholar 

  14. Hao C, Jin N, Qiu C, Ba K, Wang X, Zhang H, Zhao Q, Huang B. Balanced convolutional neural networks for pneumoconiosis detection. Int J Environ Res Public Health 2021, 18(17).

  15. Singh R, Kalra MK, Nitiwarangkul C, Patti JA, Homayounieh F, Padole A, Rao P, Putha P, Muse VV, Sharma A, et al. Deep learning in chest radiography: detection of findings and presence of change. PLoS ONE. 2018;13(10):e0204155.

    Article  Google Scholar 

  16. Wang J, Wu CJ, Bao ML, Zhang J, Wang XN, Zhang YD. Machine learning-based analysis of MR radiomics can help to improve the diagnostic performance of PI-RADS v2 in clinically relevant prostate cancer. Eur Radiol. 2017;27(10):4082–90.

    Article  Google Scholar 

  17. Zhou M, Scott J, Chaudhury B, Hall L, Goldgof D, Yeom KW, Iv M, Ou Y, Kalpathy-Cramer J, Napel S, et al. Radiomics in brain tumor: image assessment, quantitative feature descriptors, and machine-learning approaches. AJNR Am J Neuroradiol. 2018;39(2):208–16.

    Article  CAS  Google Scholar 

  18. Wang X, Yu J, Zhu Q, Li S, Zhao Z, Yang B, Pu J. Potential of deep learning in assessing pneumoconiosis depicted on digital chest radiography. Occup Environ Med. 2020;77(9):597–602.

    Article  Google Scholar 

  19. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz CP, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11):e1002686.

    Article  Google Scholar 

  20. Zhang L, Rong R, Li Q, Yang DM, Yao B, Luo D, Zhang X, Zhu X, Luo J, Liu Y, et al. A deep learning-based model for screening and staging pneumoconiosis. Sci Rep. 2021;11(1):2201.

    Article  CAS  Google Scholar 

  21. Yang F, Tang ZR, Chen J, Tang M, Wang S, Qi W, Yao C, Yu Y, Guo Y, Yu Z. Pneumoconiosis computer aided diagnosis system based on X-rays and deep learning. BMC Med Imaging. 2021;21(1):189.

    Article  CAS  Google Scholar 

  22. Tan M, Le QV. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In.; 2019: arXiv:1905.11946.

  23. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Computer Sci. 2014.

  24. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE 2016.

  25. Zeiler M, Fergus R. Visualizing and understanding convolutional networks. Cham: Springer; 2014.

    Book  Google Scholar 

  26. Huang G, Liu Z, Laurens V, Weinberger KQ. Densely connected convolutional networks. IEEE Computer Society 2016.

  27. Szegedy C, Wei L, Jia Y, Sermanet P, Rabinovich A. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR): 2015; 2015.

  28. Yu P, Zhao J, Hao X, Sun X, Ling M. Computer aided detection for pneumoconiosis based on co-occurrence matrices analysis. In: International conference on biomedical engineering & informatics: 2009; 2009.

  29. Sundararajan R, Xu H, Annangi P, Tao X, Mao L. A multiresolution support vector machine based algorithm for pneumoconiosis detection from chest radiographs. In: Proceedings of the 2010 IEEE international symposium on biomedical imaging: from nano to macro, Rotterdam, The Netherlands, 14–17 April, 2010: 2010; 2010.

  30. Devnath L, Luo S, Summons P, Wang D. An accurate black lung detection using transfer learning based on deep neural networks. In: 2019 international conference on image and vision computing New Zealand (IVCNZ): 2019; 2019.

  31. Devnath L, Summons P, Luo S, Wang D, Shaukat K, Hameed IA, Aljuaid H. Computer-aided diagnosis of coal workers; pneumoconiosis in chest X-ray radiographs using machine learning: a systematic literature review. Int J Environ Res Public Health. 2022;19(11):6439.

    Article  Google Scholar 

  32. Wang D, Arzhaeva Y, Devnath L, Qiao M, Yates D. Automated pneumoconiosis detection on chest X-rays using cascaded learning with real and synthetic radiographs. In: 2020 Digital Image Computing: Techniques and Applications (DICTA): 2020; 2020.

  33. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):60.

    Article  Google Scholar 

  34. Zhang X, Wang Z, Liu D, Ling Q. DADA: deep adversarial data augmentation for extremely low data regime classification. 2018.

  35. Zhang X, Zhou X, Lin M, Sun J: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. 2017.

  36. Wang Q, Wu B, Zhu P, Li P, Hu Q. ECA-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR): 2020; 2020.

  37. Lumini A. Comparison of different image data augmentation approaches. J Imaging. 2021;7:254.

    Article  Google Scholar 

  38. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, Haworth A. A review of medical image data augmentation techniques for deep learning applications. J Med Imaging Radiat Oncol. 2021;65(5):545–63.

    Article  Google Scholar 

  39. Qi A, Tian N. Fine-grained vehicle recognition method based on improved ResNet. In: 2020 2nd international conference on information technology and computer application (ITCA): 2020; 2020.

Download references


All authors thankfully acknowledge the financial support from the National Health Commission Key Laboratory of Pneumoconiosis Shanxi China Project and Shanxi Province Key Laboratory of Respiratory Disease. We thank radiologist Dr.Du Lijie and her group from First Hospital of Shanxi Medical University for providing imaging diagnosis reports on all CXR at First Hospital of Shanxi Medical University.


Organization: The National Health Commission Key Laboratory of Pneumoconiosis Shanxi China Project; Project number: No.2020-PT320-005; Project Leader: Xinri Zhang.

Author information

Authors and Affiliations



HD designed this research, participated in collecting images data and interpretation, image data-preprocessing, and finished the paper, which significantly contributed to the analysis. BZ conducted a deep learning model and statistical analysis. XZ and XK reviewed the manuscript and significantly impacted the interpretation of all data and the critical evaluation of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xinri Zhang.

Ethics declarations

Ethics approval and consent to participate

All patients included in this study signed the informed consent forms, and the Ethics Committee approved our study at the First Hospital of Shanxi Medical University (2020K-K104), and all methods were performed in accordance with the relevant guidelines and regulations by Declaration of Helsinki.

Consent for publication

Not applicable.

Informed consent

All patients included in this study signed the informed consent forms.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. ShuffleNet original model.

Additional file 2

. ShuffleNet v2-Attenion Pattern Graph.

Additional file 3

. The structure of ShuffleNet v2-Attenion Data Augmentation.

Additional file 4

. Database of this deep learning model(github).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dong, H., Zhu, B., Zhang, X. et al. Use data augmentation for a deep learning classification model with chest X-ray clinical imaging featuring coal workers' pneumoconiosis. BMC Pulm Med 22, 271 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Coal workers' pneumoconiosis classification
  • Chest X-ray
  • Deep learning
  • ShuffleNet
  • ECA-Net
  • Data augmentation