A multicentre prospective observational study comparing arterial blood gas values to those obtained by pulse oximeters used in adult patients attending Australian and New Zealand hospitals

Background Pulse oximetry is widely used in the clinical setting. The purpose of this validation study was to investigate the level of agreement between oxygen saturations measured by pulse oximeter (SpO2) and arterial blood gas (SaO2) in a range of oximeters in clinical use in Australia and New Zealand. Methods Paired SpO2 and SaO2 measurements were collected from 400 patients in one Australian and two New Zealand hospitals. The ages of the patients ranged from 18 to 95 years. Bias and limits of agreement were estimated. Sensitivity and specificity for detecting hypoxaemia, defined as SaO2 < 90%, were also estimated. Results The majority of participants were recruited from the Outpatient, Ward or High Dependency Unit setting. Bias, oximeter-measured minus arterial blood gas-measured oxygen saturation, was − 1.2%, with limits of agreement − 4.4 to 2.0%. SpO2 was at least 4% lower than SaO2 for 10 (2.5%) of the participants and SpO2 was at least 4% higher than the SaO2 in 3 (0.8%) of the participants. None of the participants with a SpO2 ≥ 92% were hypoxaemic, defined as SaO2 < 90%. There were no clinically significant differences in oximetry accuracy in relation to clinical characteristics or oximeter brand. Conclusions In the majority of the participants, pulse oximetry was an accurate method to assess SaO2 and had good performance in detecting hypoxaemia. However, in a small proportion of participants, differences between SaO2 and SpO2 could have clinical relevance in terms of patient monitoring and management. A SpO2 ≥ 92% indicates that hypoxaemia, defined as a SaO2 < 90%, is not present. Trial registration Australian and New Zealand Clinical Trials Registry (ACTRN12614001257651). Date of registration: 2/12/2014.


Background
Pulse oximeter measured oxygen saturation is a noninvasive approximation of arterial oxygen saturation (SpO 2 ), which is considered the fifth vital sign in clinical assessment [1][2][3]. In clinical practice monitoring of SpO 2 values is required to titrate oxygen therapy to avoid the risks of hypoxaemia and hyperoxaemia [1,2].
Assessment of agreement between the gold standard arterial blood gas (ABG) measurement of oxygen saturation (SaO 2 ) and SpO 2 is essential for the interpretation and use of pulse oximetry values. It is also essential for the development of safe and practical recommendations for SpO 2 targets for the titration of oxygen therapy. Overestimation of actual SaO 2 may mean clinically relevant hypoxaemia is not detected or treated. Conversely, underestimation of actual SaO 2 may result in unnecessary oxygen therapy with the associated risks of hyperoxaemia.
The United States regulatory body, the Food and Drug Administration (FDA) centre, requires the accuracy of pulse oximeters to be tested against SaO 2 , in healthy adults in laboratory settings [4]. In clinical practice a number of factors influence oximeter accuracy including the degree of hypoxaemia, hypercapnia, glycosylated haemoglobin (HbA1c), skin pigmentation, movement artefacts, peripheral perfusion and use of nail polish or acrylic nails [3,[5][6][7][8][9][10][11][12]. Clinical studies report that SpO 2 can both over and underestimate SaO 2, and the values may have wide limits of agreement . However, oximeter accuracy may also differ by oximeter model [7,8,12,18,19]. Manufacturers are continuously evolving sensor technology and software algorithms [3]. This means previous studies may not be directly relevant to current clinical practice because of the population groups and oximeter models used.
In our recent study investigating the accuracy of oximeters used in Australian and New Zealand Intensive Care Units (ICUs), we demonstrated a mean bias for SaO 2 minus SpO 2 of only 0.15%, with limits of agreement plus or minus 4.4% [18]. In this study we aim to investigate the agreement between SaO 2 and SpO 2 measurements by oximeters currently in use in Australian and New Zealand hospitals outside the critical care setting, either on the ward or in the Emergency (ED), High dependency Unit (HDU) or outpatient departments. Secondary objectives were to evaluate the diagnostic performance of SpO 2 to detect hypoxaemia, and investigate factors affecting oximeter accuracy.

Methods
This multicentre prospective non-experimental observational study compared simultaneous SpO 2 and SaO 2 measurements in inpatients and outpatients at Westmead Hospital in Australia, and Wellington and Christchurch Hospitals in New Zealand. It was prospectively registered on the Australian and New Zealand Clinical Trials Registry (ACTRN12614001257651). Ethical approval was obtained from the Northern B Ethics Committee in New Zealand (14/NTB/115) and the Western Sydney Local Health District Human Research Ethics Committee in Australia (LNR/14/WMEAD/387).
Patients aged 16 years or older who were to have an ABG measurement as part of routine clinical care were recruited. Full written informed consent was provided in New Zealand by participants, or next of kin if participants were unable to (for example, if they were too unwell). Participants were not recruited if they had a diagnosis of sickle cell anaemia, methaemoglobinemia, carbon monoxide (CO) poisoning, or were previously recruited to the study and had paired SpO 2 and SaO 2 values successfully recorded. They could also be excluded for any other condition which, at the investigator's discretion, was believed may present a safety risk or impact upon the feasibility of the study or the interpretation of the study results.
Participants were identified in hospital wards and outpatient clinics. Demographic data were recorded. Skin colour was assessed using the Fitzpatrick scale [33].
SpO 2 was measured during a clinically indicated ABG. The oximeter probe was put in place for at least 10 s prior to the ABG, or longer if indicated by manufacturer's instructions. SpO 2 was measured from an earlobe or finger probe, depending on departmental policies and what the staff member responsible for performing oximetry would usually use to monitor that patient. If a finger probe was used it was placed on the index finger on the contra-lateral side to ABG sampling. Where possible, nail polish was removed before measurement.
The SpO 2 value recorded was the value on the oximeter when blood was first observed to enter the ABG collection vial. If the participant was receiving supplementary oxygen at the time of the ABG, this was also recorded. Measurements paired with ABG samples subsequently identified to be venous or unusable, e.g. sample too small for analysis, were excluded. The models of oximeter and ABG analyser were recorded. Data recorded from the ABG were SaO 2 , partial pressure of oxygen (PaO 2 ), partial pressure of carbon dioxide (PaCO 2 ), Carboxyhaemoglobin (CoHb), Methaemoglobin (MetHb) and HbA1c, if measured as part of clinical practice. Investigators were asked to record whether they had any concerns with oximeter accuracy, such as nail polish that was not removed, poor oximeter signal, or patient movement. Participants in which there was a reported concern with oximeter accuracy were not excluded from analyses.
Bland Altman plots and estimation of bias and limits of agreement were used to describe the agreement between SpO 2 and SaO 2 measurement, using SaO 2 as the reference standard.
The diagnostic performance of SpO 2 < 90% to detect hypoxaemia, defined as a SaO 2 < 90% and defined as a PaO 2 < 60 mmHg, was evaluated using contingency tables, with sensitivities and specificities estimated by an exact binomial method for proportions. A post hoc analysis of the ability for SaO 2 < 90% to detect a PaO 2 < 60 mmHg was performed using the same methods.
Associations with mean bias were illustrated by a scatter plot with a scatter plot smoother and a Spearman rank-correlation coefficient for SaO 2 , and ANOVA for categorical variables in Table 1. The mean difference between categories was assessed with an F-test. Where a categorical variable only had one observation it was not used in the ANOVA. If important predictors of bias were identified, it was planned to use Bland Altman methods determine whether there was also an effect on limits of agreement.
To estimate the difference between SpO 2 and SaO 2 due to different oximetry devices, estimation of variance components and associated intra-class correlation coefficients for the effect of oximeters as well as best linear unbiased predictors of the effect of individual oximeters were assessed by mixed linear models and estimation by restricted maximum likelihood. SAS version 9.4 was used.
The planned sample size of 400 was based on three considerations. Firstly, for the analysis of variables that predict the size of the bias we sought to have between 20 and 40 participants for each degree of freedom in the ANOVA. Based on the six variables, some of which have multiple levels, this required between 200 and 400 participants. Secondly the estimates of paired SD for the SpO 2 to SaO 2 difference from patients in a range of clinical settings were 0.55% [6], 2.1% [17], and 2.2% [16]. There is 80% power, with a type I error rate of 5%, to detect a SpO 2 to SaO 2 difference of 2% for any of the variables that might predict bias, if there were two equal sized groups of 21 participants. For estimation of variance of components for the different pulse oximeters by Best Unbiased Linear Predictors between 20 and 25 participants per oximeter brand were required and it was estimated that between 10 and 20 oximeter brands would be used.

Participants
Four-hundred patients were recruited; 253 from Christchurch, 103 from Wellington and 44 from Westmead Hospital (Fig. 1). Participant characteristics and details of the pulse oximeters and ABG analysers are presented in Table 2.

Agreement between SpO 2 and SaO 2
The bias for SpO 2 minus SaO 2 was − 1.2%, with limits of agreement − 4.4 to 2.0%. The Bland Altman plot is shown in Fig. 2. In 10/400 (2.5%) participants the SpO 2 was at least 4% lower than SaO 2 . In one of these participants the investigator reported concern with oximeter accuracy. In 3/400 (0.8%) participants the SpO 2 was at least 4% higher than the SaO 2 . In one of these participants the investigator reported concern with oximeter accuracy. Characteristics of these participants are in the Online Additional file 1: Table S2).

Factors potentially influencing oximeter accuracy
There was no statistical evidence of an association between SaO 2 and bias between SpO 2 and SaO 2 ; Spearman coefficient 0.003, P = 0.94. Of the other factors from Table 1, only a diagnosis of diabetes was identified as a predictor of bias (P = 0.05). In diabetics it was − 0.8 (95% limits of agreement − 4.4 to 2.8), in non-diabetics it was − 1.2 (− 4.4 to 2.0). Detailed results are presented in the Online Additional File (Additional file 1: Figure S3 and Table S5).
There were at least 14 different oximeter models used. The most common oximeter models used were the  Table S1 for all models). The difference in the estimation of variance components was 0.16 for oximeter brand and 2.48 for residual, resulting in an intra-class correlation coefficient of 0.94. This can be interpreted as approximately 6% of variation in the relationship between SpO 2 versus SaO 2 being due to oximeter brand. Detailed results by oximeter are shown in the Online Additional file 1: Table S6).
Concern with oximeter accuracy was reported by investigators in 16 patients, nine of which had nail polish, acrylic nail or double nail. Other causes for concern are presented in the Online Additional file 1: Table S1).

Discussion
The bias and limits of agreement between SpO 2 and SaO 2 suggest that pulse oximetry is an accurate method to assess SaO 2 in most adult patients in the clinical setting. However, in a small number of participants potentially clinically important differences between SpO 2 and SaO 2 could affect patient assessment and management. A practical guide that can be derived from these data is that a SpO 2 ≥ 92% effectively rules out presence of hypoxaemia, indicated by a SaO 2 < 90%. There were no clinically significant differences in oximeter accuracy based on absolute level of SaO 2 , hospital location, numerous clinical characteristics or oximeter brand.
The magnitude of bias and associated limits of agreement from the range of oximeters in this study suggested that overall they perform at a similar level or better than oximeters used in many of the clinical studies performed in the last 10 years [5, 6, 8, 10-12, 18-26, 28, 30, 31]. This is in keeping with constant oximeter sensor technology and software improvements by manufacturers over time [3]. Specifically, the bias and limits of agreement for SaO 2 minus SpO 2 were similar to the values recently obtained in critically unwell patients in the ICU setting (0.15%, limits of agreement plus or minus 4.4%) [18].
The negative bias of − 1.2%, albeit small, meant that the oximeters tended to underestimate SaO 2 . Such underestimation has the potential to result in a conservative estimate of risk of hypoxaemia and may lead to more liberal oxygen therapy than required. SpO 2 underestimated SaO 2 by at least 4% in around 3% of participants, and overestimated it by at least 4% in less than 1% of participants. These findings mean that while the oximeters performed well overall, there were still potentially clinically relevant differences in SpO 2 and SaO 2 in a small proportion of the participants. In the majority of the participants with SpO 2 and SaO 2 values differing by at least 4% the investigators did not state they had any concerns with oximeter accuracy. This highlights the potential difficulty in identifying when an oximetry value is incorrect and emphasises the importance of guideline recommendations to consider oximetry values in clinical context [3].
The TSANZ [2] and BTS [1] guidelines for acute oxygen therapy both recommend use of pulse oximetry as a vital sign and tool to titrate oxygen therapy to a target oxygen saturation range. The TSANZ recommend oxygen is delivered to a SpO 2 target range of 92 to 96% in patients not at risk of hypercapnic respiratory failure [2]. This range was developed to reduce the risks of both hyperoxaemia and hypoxaemia, while recognising  potential oximeter accuracy limitations [35]. The lower limit of 92% is supported by a SpO 2 saturation of ≥92% indicating that hypoxaemia (SaO 2 < 90%) is not present. The recommended upper SpO 2 limit of 96%, aimed at avoiding hyperoxaemia, is supported by the finding that 12 of the 13 participants with a PaO 2 of greater than 100 mmHg had a SpO 2 value over 96%. A SpO 2 < 90% had a specificity of only 70.5% in identifying a PaO 2 < 60 mmHg, while for SaO 2 < 90% it was only 54.1%. These values are in keeping with the majority of participants being positioned to the left of the predicted oxygen haemoglobin dissociation curve. In keeping with recommendations by the TSANZ Oximetry Guidelines [3], these findings highlight the limitations of estimating PaO 2 from saturation values, and vice versa.
Patients with sickle cell anaemia, methaemoglobinemia, or CO poisoning were excluded from the study and nail polish was removed where possible as these factors are well established to impact on oximeter results [3]. SaO 2 , oximeter model and the numerous clinical variables were not found to significantly impact on oximeter  accuracy. However, it was not possible to evaluate the effect of earlobe oximetry, Fitzpatrick scale V or VI, or ED location on accuracy due to there being only one participant in each of these categories. This study had the advantage of a multicentre design and use of a range of oximeters routinely available to clinical staff in a variety of hospital settings. A wide range of adult patients were included, both in terms of presenting diagnosis and illness severity. While there were a range of SaO 2 values between 72 and 100%, the results cannot be applied to patients with a SaO 2 of under 70%, at which oximeter inaccuracy is well recognised [3]. Results may not be applicable to paediatric patients or adult patients in theatre, ICU or ED, especially as a variety of factors specific to these patients have been previously identified as affecting oximeter accuracy [11,15,17,25,27,28,30,31]. Having only one participant with a Fitzpatrick score of V, and none with VI, meant study findings may not be applicable to patients with higher skin pigmentation. This is especially important as oximeter accuracy has been demonstrated to decrease as pigmentation increases, particularly at lower SaO 2 levels and in oximeters of the same brand as some of those used in our study (Massimo Radical and Nonin 9700) [7].
Single oximeter and ABG measurement pairing from each participant were used, which has the advantage of removing potential bias from repeated measures in the same participant. However, this did mean we could not specifically assess the accuracy of SpO 2 to detect changes in SaO 2 over time.

Conclusions
Overall, the oximeters in this study had good accuracy in determining individual SaO 2 values and detecting hypoxaemia in a range of clinical settings. The use of a SpO 2 of 92% as the lower boundary for the titration of oxygen therapy was supported by 100% sensitivity for SpO 2 < 92% in identifying hypoxaemia (SaO 2 < 90%). In a small number of participants discrepancies between SpO 2 and SaO 2 could have implications for patient assessment and management. This highlights the importance interpreting SpO 2 within clinical context.
Additional file 1. This is an online supplement containing additional details and data as per the manuscript text.