Skip to main content

Evaluation of intra- and interobserver reliability in the assessment of the ‘critical trochanter angle’



The recently described ‘critical trochanter angle’ (CTA) is a novel parameter in the preoperative risk assessment of stem malalignment in total hip arthroplasty. As its reproducibility needs to be evaluated, the given study aims to investigate intra- and interobserver reliability. It is hypothesized that both analyses justify the clinical use of the CTA.


A total of 100 pelvic radiographs obtained prior to total hip arthroplasty were retrospectively reviewed by four observers with different levels of clinical experience. The CTA was measured twice by each observer at different occasions in the previously described technique. Intra- and interobserver reliability was evaluated using intraclass correlation coefficients (ICC) with confidence intervals (CI) and the Bland–Altman approach.


The mean CTA in both measuring sequences was 20.58° and 20.78°. The observers’ means ranged from 17.76° to 25.23°. Intraobserver reliability showed a mean difference of less than 0.5° for all four observers (95% limit of agreement: − 7.70–6.70). Intraobserver ICCs ranged from 0.92 to 0.99 (CI 0.88–0.99). For interobserver variation analysis, ICCs of 0.83 (CI 0.67–0.90) and 0.85 (CI 0.68–0.92) were calculated.


Analyses concerning intra- and interobserver reliability in the assessment of the CTA showed ‘very good’ and ‘good’ results, respectively. In view of these findings, the use of the CTA as an additional preoperative parameter to assess the risk of intraoperative stem malalignment seems to be justified.


Preoperative planning is mandatory when performing total hip arthroplasty (THA) because it reduces the risk of inaccurate biomechanical reconstruction and may also prevent over- and undersizing of implant components [1,2,3]. Incorrect offset reconstruction must be avoided as it harbours the risks of alterations in leg length and postoperative gluteal insufficiency [4]. In this context, intraoperative component positioning is of the utmost importance. With regard to stem orientation in THA, several factors of influence have been identified. Amongst others, the surgical approach, implant design, femoral broach shape, the surgeon’s level of experience and the presence of deformities such as dysplasia have to be mentioned. [5,6,7,8,9]. Varus stem alignment in particular has been correlated to the following risk factors: low centrum collum diaphyseal angle (CCD) in coxa vara, long thigh neck anatomy, greater trochanteric height, a lower canal-flare index and distinct trochanter overhang [5, 10]. With the first description of the ‘critical trochanter angle’ (CTA), a further parameter was recently introduced for preoperative risk assessment of stem malalignment [11]. This novel geometric angle does not measure the trochanter overhang alone, but the overhang in relation to the femur shaft axis. Moreover, it is independent of the individual size of the hip. Varus stem alignment of two degrees and more had a sensitivity of 90% and a specificity of 80% in patients with a preoperative CTA of 22.75° or less [11].

As for all new parameters that may affect diagnostics, treatment or therapy outcome, the reproducibility and reliability of the CTA have to be determined in order to justify its use in everyday clinical practice. Therefore, the given study aims to investigate the intra- and interobserver reliability of the CTA.


For retrospective analysis, 100 preoperative conventional pelvic radiographs of patients with unilateral coxarthrosis were evaluated. Radiographic evaluation confirmed osteoarthritis stage 3 and 4 according to Kellgren and Lawrence in each case [12]. All patients underwent THA at the same institution (EndoCert® certified centre of arthroplasty) between 2012 and 2015. Only collarless straight tapered stems (Corail® type) and cementless hemispheric cups via direct lateral Hardinge approach were used. Operative interventions were exclusively performed by EndoCert®-approved high volume surgeons with > 100 THAs per year.

For evaluation in this study’s context, only standardized anteroposterior (ap) pelvic radiographs centred over the pubic symphysis were reviewed. Quality control was ensured by systematic presentation and evaluation of all performed X-ray diagnostics in weekly radiologic reviews with mandatory participation for the medical staff. Final selection for inclusion in the study was made by the first and last author (each with 10 years of experience). Radiographs showing previous fractures, abnormal head–neck anatomy or ossifications close to the trochanter were excluded (n = 8). Furthermore, radiographs of poor quality, e.g. no true ap-setting, were also excluded from the study (n = 9). In order to obtain the target quantity of 100 measurable radiographs, 115 radiographs had to be assessed in total (Fig. 1). Four of the five authors, all members of the Department of Orthopaedics & Orthopaedic Surgery of the Saarland University Medical Centre or the Department of Orthopaedics & Traumatology of the University of Duisburg-Essen, acted as observers. Two of them were tenth-year consultants [SS (observer 1) and MH (observer 2)], whereas two observers were fourth-year [MS (observer 3)] and second-year [IZ (observer 4)] residents. Due to their work on the first description of the CTA, observers 1 and 2 were familiar with performance of the measurements and instructed observers 3 and 4 in the method. Assessment of the pelvic radiographs was carried out using the mediCAD® planning software (mediCAD Hectec GmbH, Altdorf, Germany). The CTA was measured as described by Haversath et al. First, the angle crest localized at the intersection of the femoral shaft and neck axis was identified. Then, the CTA was measured between the shaft axis and leg, intersecting the vertex between the lateral and superoposterior facet of the greater trochanter (Fig. 2) [11]. The CTA was determined twice by each observer on two different occasions, though the order of the patients was changed randomly before the second measurement. Furthermore, the observers were blinded to the patients’ clinical information, to other observers’ results as well as to their own previous measurements. Additionally, they were not given any feedback between the observations.

Fig. 1
figure 1

Flowchart demonstrating the inclusion/exclusion of radiographs to obtain the target quantity of n = 100

Fig. 2
figure 2

Measurement of the ‘critical trochanter angle’ (CTA) as described by Haversath et al. [11]

Descriptive and comparative statistical analysis was performed using SPSS® Statistics (Version, IBM®). Normal distribution was checked by means of the Kolmogorov–Smirnov test and confirmed for all samples. The difference between the two series of each observer in their measurements was tested concerning the existence of significant differences using the one-sample t-test. For assessing the agreement between measurements of a continuous variable (CTA) across multiple observers the use of intraclass correlation coefficient (ICC) and Bland–Altman plot are available [13]. To evaluate intraobserver reliability, the mean difference between the two measurements of each observer was calculated and analysed concerning its relation to the 95% limits of agreement [14,15,16]. Visualization was realized by plotting the differences against the mean measurements as described by Bland and Altman. Intra- and interobserver reliability was tested by means of the intraclass correlation coefficient and 95% confidence interval (CI) [17, 18]. In particular, this was done using the two-way random model and absolute agreement [19].


Intraobserver reliability

Between each observer’s first and second measuring sequence, no significant differences in the CTA values could be detected with p-values ranging from 0.21 to 0.68. The mean difference between both test series of all observers was less than 0.5° with the 95% limits of agreement ranging from -7.70° to 6.77°. Intraobservers’ ICCs ranged from 0.99 to 0.92 (Table 1). The Bland–Altman plots illustrate the proximity achieved between the two measuring sequences by plotting the differences between the two measurements of each observer against their mean values (Figs. 3, 4, 5 and 6). This shows that the measurements of observer 4 are characterized by a distinctly higher level of statistical scatter and a wider range in the 95% limits of agreement compared to the other observers.

Table 1 Intraobserver variation of observers 1–4 between the first and second measurement of the ‘critical trochanter angle’ (CTA)
Fig. 3
figure 3

Intraobserver variation of the ‘critical trochanter angle’ (CTA) for observer 1; solid line—mean value of measurements, dotted lines—95% limits of agreement above and below the mean value

Fig. 4
figure 4

Intraobserver variation of the ‘critical trochanter angle’ (CTA) for observer 2 (for explanations see Fig. 2)

Fig. 5
figure 5

Intraobserver variation of the ‘critical trochanter angle’ (CTA) for observer 3 (for explanations see Fig. 2)

Fig. 6
figure 6

Intraobserver variation of the ‘critical trochanter angle’ (CTA) for observer 4 (for explanations see Fig. 2)

Interobserver reliability

The mean CTA regarding both sequences of all four observers was 20.58° (mean min: 17.76, mean max: 25.06) for the first and 20.78° (mean min: 18.22, mean max: 25.23) for the second measurement. Interobserver correlation analysis for all four observers showed an intraclass correlation coefficient (ICC) of 0.83 (CI 0.67–0.90) for the first and an ICC of 0.85 (CI 0.68–0.92) for the second test series, respectively (Table 2).

Table 2 Interobserver correlation of the ‘critical trochanter angle (CTA) for both measuring sequences


The CTA is a novel parameter which helps to evaluate the risk for intraoperative stem malpositioning in THA. According to the authors, its determination provides further and possibly more valuable information in comparison to existing parameters such as the CCD [11].

In contrast to merely focusing on correlation Bland and Altman described a statistical approach for evaluating the agreement between two different measurements of the same quantity emphasizing the importance and need for collection of replicated data by performing repeated measurements [14, 20].

In this study, significant differences between two lines of measurements by each observer could be statistically excluded, thus proving consistent data. The two measurements by each observer showed a mean difference of less than 0.5°, indicating very good repeatability. This is confirmed by the calculation of the intraobserver ICCs, which ranged from 0.92 to 0.99 for all observers and the results thus show a ‘very good’ correlation according to the interpretation recommended by Cicchetti and Koo & Li [21, 22]. The graphic visualization realized by the usage of Bland–Altman plots for all four observers demonstrate the proximity between the first and second measurement and reveal only a few outliers beyond each of the 95% limits of agreement. Additionally, a homogenous distribution of values above and below the mean difference line as well as for the mean CTA is demonstrated. Therefore, a proportional bias indicated by a trend towards above or below the mean difference or towards higher or lower CTA values in general seems to be rather unlikely [15]. Comparing the four plots with one another, the measurements of observer 4 appear to be scattered more widely. This is substantiated by a distinctly wider range of measured values and a greater standard deviation compared to the other observers. So, there is at least some indication that clinical experience plays a significant role in accurate assessment of the CTA as observer 4 was a second-year resident and the youngest participant among all observers [23, 24].

Regarding interobserver variation, mean CTA values between 17.76° and 25.23° were found. Particularly observer 1 showed a tendency towards greater values in measuring the CTA compared to the other observers. However, calculation of the intraclass correlation coefficient for interobserver reliability of the two measuring sequences presented results of 0.83 and 0.85, respectively. Again, according to the suggestions of interpretation of Cicchetti and Koo & Li, the results of the given study prove a ‘good’ (Koo & Li) to ‘very good” (Cicchetti) interobserver reliability in the assessment of the CTA [21, 22].

However, possible limitations related to the results of this study were identified. The quality of the pelvic radiographs is crucial for pursuing accurate measurements. Despite critical assessment of the radiographs used before measuring the CTA, a bias cannot be completely excluded. All observers in this study were orthopaedists or orthopaedic surgeons. Representatives from other medical disciplines, such as radiologists, might have obtained different results [23]. However, as the CTA is supposed to be a measure to estimate the risk of varus stem alignment, its clinical use is likely to be primarily performed by orthopaedic surgeons as part of preoperative planning. Finally, it must be taken into account that assessment of the CTA regarding intra- and interobserver reliability has not been done before. Therefore, as there are no similar studies with which the given results can be compared, critical evaluation of their significance is not possible. As concerns the clinical relevance of this study’s findings, it has to be pointed out that preoperative measurement of the CTA only allows a risk assessment of possible varus stem alignment due to bony characteristics. In a multifactorial setting, further parameters which are known to affect intraoperative implant positioning such as surgical approach, implant design, the surgeon’s skills and deformities still have to be paid attention to in order to achieve desirable postoperative results [5,6,7,8,9].


The intra- and interobserver reliability of the CTA is ‘very good’ and ‘good’. Therefore, the CTA is a valuable and reproducible preoperative parameter for determining the risk for stem malalignment in THA due to bony characteristics. However, the individual observer’s level of experience in evaluating pelvic radiographs may affect the quality of CTA measurements. This is the first study to investigate the intra- and interobserver reliability in the assessment of the CTA.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.



Critical trochanter angle


Intraclass correlation coefficient


Confidence interval


Total hip arthroplasty


Centrum collum diaphyseal angle




Standard deviation


  1. Della Valle AG, Padgett DE, Salvati EA. Preoperative planning for primary total hip arthroplasty. J Am Acad Orthop Surg. 2005;13:455–62.

    Article  PubMed  Google Scholar 

  2. Della González Valle A, Slullitel G, Piccaluga F, Salvati EA. The precision and usefulness of preoperative planning for cemented and hybrid primary total hip arthroplasty. J Arthroplasty. 2005;20:51–8.

    Article  Google Scholar 

  3. Barrack RL, Burnett RSJ. Preoperative planning for revision total hip arthroplasty. Instr Course Lect. 2006;55:233–44.

    PubMed  Google Scholar 

  4. Flecher X, Ollivier M, Argenson JN. Lower limb length and offset in total hip arthroplasty. Orthop Traumatol Surg Res. 2016;102:S9-20.

    Article  CAS  PubMed  Google Scholar 

  5. Batailler C, Fary C, Servien E, Lustig S. Influence of femoral broach shape on stem alignment using anterior approach for total hip arthroplasty: a radiologic comparative study of 3 different stems. PLoS ONE. 2018;13:e0204591.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Haversath M, Lichetzki M, Serong S, Busch A, Landgraeber S, Jäger M, Tassemeier T. The direct anterior approach provokes varus stem alignment when using a collarless straight tapered stem. Arch Orthop Trauma Surg. 2020.

    Article  PubMed  Google Scholar 

  7. Klug A, Gramlich Y, Hoffmann R, Pfeil J, Drees P, Kutzner KP. Epidemiologische Entwicklung der Hüftendoprothetik in Deutschland—Wo stehen wir aktuell? Z Orthop Unfall. 2019.

    Article  PubMed  Google Scholar 

  8. Rowan FE, Benjamin B, Pietrak JR, Haddad FS. Prevention of dislocation after total hip arthroplasty. J Arthroplasty. 2018.

    Article  PubMed  Google Scholar 

  9. Greber EM, Pelt CE, Gililland JM, Anderson MB, Erickson JA, Peters CL. Challenges in total hip arthroplasty in the setting of developmental dysplasia of the hip. J Arthroplasty. 2017;32:S38–44.

    Article  PubMed  Google Scholar 

  10. Murphy CG, Bonnin MP, Desbiolles AH, Carrillon Y, Aїt Si Selmi T. Varus will have varus; a radiological study to assess and predict varus stem placement in uncemented femoral stems. Hip Int. 2016;26:554–60.

    Article  PubMed  Google Scholar 

  11. Haversath M, Busch A, Jäger M, Tassemeier T, Brandenburger D, Serong S. The “critical trochanter angle”: a predictor for stem alignment in total hip arthroplasty. J Orthop Surg Res. 2019;14:165.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kohn MD, Sassoon AA, Fernando ND. Classifications in brief: Kellgren-Lawrence classification of osteoarthritis. Clin Orthop Relat Res. 2016;474:1886–93.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: measures of agreement. Perspect Clin Res. 2017;8:187–91.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–10.

    Article  CAS  PubMed  Google Scholar 

  15. Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb). 2015;25:141–51.

    Article  Google Scholar 

  16. Sedgwick P. Limits of agreement (Bland-Altman method). BMJ. 2013;346:f1630.

    Article  PubMed  Google Scholar 

  17. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8.

    Article  CAS  PubMed  Google Scholar 

  18. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1:30–46.

    Article  Google Scholar 

  19. Mehta S, Bastero-Caballero RF, Sun Y, Zhu R, Murphy DK, Hardas B, Koch G. Performance of intraclass correlation coefficient (ICC) as a reliability index under various distributions in scale reliability studies. Stat Med. 2018;37:2734–52.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–60.

    Article  CAS  PubMed  Google Scholar 

  21. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6:284–90.

    Article  Google Scholar 

  22. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–63.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Carlisle JC, Zebala LP, Shia DS, Hunt D, Morgan PM, Prather H, et al. Reliability of various observers in determining common radiographic parameters of adult hip structural anatomy. Iowa Orthop J. 2011;31:52–8.

    PubMed  PubMed Central  Google Scholar 

  24. Schottel PC, Park C, Chang A, Knutson Z, Ranawat AS. The role of experience level in radiographic evaluation of femoroacetabular impingement and acetabular dysplasia. J Hip Preserv Surg. 2014;1:21–6.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and Saarland University within the funding programme Open Access Publishing.


Open Access funding enabled and organized by Projekt DEAL.. There is no funding source.

Author information

Authors and Affiliations



SS measured radiographs, performed statistical analysis and wrote the manuscript. MH measured radiographs and contributed in writing the manuscript. MS and IZ measured radiographs. SL and MJ revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sebastian Serong.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was obtained from the local ethics committee for this retrospective study (Reference: 16-6828-BO).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Serong, S., Schutzbach, M., Zovko, I. et al. Evaluation of intra- and interobserver reliability in the assessment of the ‘critical trochanter angle’. Eur J Med Res 25, 67 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: