During the process of cognitive interviewing, it was suggested that the question of BMI be broken down into its components, body mass and height, which would make it easier to use and understand. We chose to adopt this suggestion. We were unable to find any other validation studies of the SBQ where this was done.
Our translation of the STOP-Bang questionnaire showed good temporal stability. Intraclass correlation coefficient between test and retest total scores was high.
Internal consistency was evaluated with two statistical methods. The first was calculating Cronbach’s alpha, which was somewhat low at 0.63. This on par with 0.62 for the Brazilian translation [15], close to 0.7 for the Arab translation [16] and higher than the Lithuanian translations’ 0.41 [17]. We also performed factor analysis, which is more suitable for dichotomous variables, [18, 19] such as are found in the SBQ. This showed good internal consistency for six out of the questionnaire’s eight items. Item number two “Do you often feel tired, fatigued or sleepy during the daytime?” and item six “Age older than 50?” had a loading score below the threshold of 0.3. For item two, this could be explained by other common causes of tiredness other than OSA, for instance depression, which are common in the general population. Another cause could be the fact that some patients with OSA do not experience excessive daytime sleepiness [20]. The low factor loading of item five that refers to age could perhaps be explained by the fact that the study population consisted mostly of middle-aged subjects and a larger sample could perhaps have been more telling. Dr. Chung, who designed the questionnaire, did not evaluate internal consistency, citing that the questionnaire reflected four different dimensions of OSA morbidity and that internal consistency checking was, thus, not applicable [21]. Internal consistency checking was nevertheless performed in certain validation studies [7, 16] and omitted in others [17, 22, 23]. When it was carried out, Cronbach’s coefficient alpha was used, and values were typically low.
The prevalence of OSA (AHI of ≥ 5) in our sleep clinic population was 69.6%. For the Portuguese version, the prevalence was 78% [7], for the Lithuanian, this was 93% [17], for the Arabic, it was 94% [16] and for the Malayan, it was 100% [23].
A comparison of the answers given by patients with and without OSA (AHI ≥ 5) showed significant differences in all but one of the eight questions, i.e. the question referring to a BMI ≥ 35 where the p value was 0.113. This was somewhat surprising considering that OSA has been reported in over 40% of persons with a BMI of more than 30 [20]. Nevertheless, of the 44 patients with a BMI ≥ 35, only 9 did not have OSA. Interestingly, Reis et al. [7] also found BMI to be statistically nonsignificant. The specific cutoff value used for the BMI might be the cause. This sentiment is supported by Fig. 3 and Table 5 which show a correlation between the BMI and AHI.
For a SBQ score of 3, we found that the area under the ROC curve was high at 0.757 (95% CI 0.692–0.823; p < 0.001) for all OSA (AHI ≥ 5). This increased slightly for moderate/severe and severe OSA to 0.768 (95% CI 0.711–0.825; p < 0.001) and 0.77 (95% CI 0.704–0.836; p < 0.001), respectively. The AUC for all OSA (AHI ≥ 5) obtained by Reis et al. [7] was slightly higher than ours at 0.806 (95% CI 0.730–0.881), but slightly lower for moderate/severe (AHI ≥ 15) at 0.730 (95% CI 0.661–0.798) and severe OSA at 0.728 (0.655–0.801).
Among patients referred to the sleep clinic the Slovenian version of SBQ, at a score of 3, showed a high sensitivity 92.1% (86.9–95.7%) and moderate specificity of 44.4% (32.7–56.6%) for all OSA (AHI ≥ 5). This was on par with benchmarks such as Chung et al. [21], who had a sensitivity of 72.1% and specificity of 38.2% and Silva et al. [8] with sensitivity of 82.0% and specificity of 43.3% for the same range and cutoff. Low specificity was also observed in a number of translations [7, 16, 17] as well as meta-analysis by Nagappa et al. [9].
The PPV for an SBQ score of 3 for any OSA (AHI ≥ 5) was high at 79.2 (75.5–82.4). Specificity and PPV increased continuously for every increase in the SBQ. These results were on par with other translations of the SBQ [7, 24]. High sensitivity and PPV are essential for screening tools, but it could be argued that NPV is perhaps even more important for risk stratification. Our results show that a STOP-Bang score of 2 had a NPV of 80.0% (46.6–94.8) for all OSA (AHI ≥ 5) and 100.0% for moderate/severe (AHI ≥ 15) and severe OSA (AHI ≥ 30). Although our NPV might be higher due to an underestimation of the AHI brought about by the high percentage of sleep studies conducted with PG, the results are similar those obtained by Portuguese researchers [7].
A SBQ of 3 was chosen as the recommended cutoff. This is in line with other recent translations [7, 16, 17, 23].
An important study limitation was that the population referred to the sleep clinic was in a sense already pre-screened by referring practitioners. Our findings, thus, cannot be extended to other settings.
In our study, we primarily utilized PG, which accounted for 97.8% of all recordings. PG devices do not include sleep staging and can give lower AHI compared with PSG where periods of wakefulness are excluded from the calculation of AHI [25]. PG has, however, been shown to be a reliable alternative to PSG and is becoming ever more prevalent in clinical practice [7, 26]. Ours was not the first study to have used PG for validation of the STOP-Bang questionnaire [7].