University of Groningen
Grip on prognostic factors after forearm fractures
Ploegmakers, Joris Jan Willem
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date: 2019
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Ploegmakers, J. J. W. (2019). Grip on prognostic factors after forearm fractures. Rijksuniversiteit Groningen.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.
Joris J.W. Ploegmakers Konrad Mader Dietmar Pennig
Chapter 2
Four distal radial fracture classifications tested among a large
panel of Dutch trauma surgeons
ABSTRACT
Forty-five observers (trauma surgeons and residents) classified five different radiographs of distal radial fractures according to the AO/ASIF, Frykman, Fernandez and Older classifications. Four months later, the same panel classified the same radiographs in a different order. Mean interobserver correlation for all cases was fair to moderate according to the Spearman rank test. However, these classifications showed poor correlation with the gold standard as classified by the senior author (DP).
All intraobserver agreements were demonstrated a moderate Kappa agreement (K w = 0.52) for the AO/ASIF classification and fair for the Frykman (K w = 0.26), Fernandez (K w = 0.24) and Older (K w = 0.27) classifications. When the group was divided based on years of clinical experience (< 6 years; ≥ 6 years), there was a poor correlation between experience and consistency among all four classifications. Based on these findings, we do not recommend use of these classifications for clinical application because of their questionable reproducibility and reliability.
INTRODUCTION
Classification systems have been developed to gain greater insight into trauma mechanism, treatment and prognosis of distal radial fractures. The international literature describes about twenty different classification systems for wrist fractures. We selected popular classifications with different rationales. In the literature, small panels have categorised many radiographs with one or several classification systems at different points in time, in order to establish reliability and reproducibility of these classifications (Andersen et al., 1991,Kreder et al., 1996,Andersen et al., 1996,Flikkila et al., 1998,Illarramendi et al., 1998,Oskam et al., 2001).The goal of the present study was to achieve insight into reliability and agreement with a digital questionnaire for four classifications – AO (Johnstone et al., 1993, Kreder et al., 1996 (Johnstone et al., 1993,Kreder et al., 1996,Flikkila et al., 1998,Illarramendi et al., 1998), Frykman (Frykman, 1967), Older (Andersen et al., 1991) and Fernandez (Fernandez, 1993)) – with a large panel and few radiographs on two occasions. In DiRECT (Distal Radial fracture Electronic Classification Trial), both interobserver and intraobserver reproducibility was scored.
MATERIAL AND METHODS
After obtaining authorisation from both Dutch trauma societies boards (Nederlandse Vereniging voor Traumatologie (NVT) for general surgeons and Nederlandse Vereniging voor Orthopaedische Traumatologie (NVOT) for orthopaedic surgeons), we invited their members to link voluntarily onto our website and participate in the trial. All members were informed in the thesis of our research and supplied with the website address and an entry code. Four months later, all 625 + 404 members were asked to participate again and complete the classification part for the second time, but now in a different order. While there was a voluntarily participation of members of both trauma societies and to prevent preliminary ending of participants of the questionnaires we used five cases.
The internet website for DiRECT was constructed, and consisted of two sections: · Personal data: discipline, affiliation, years of clinical trauma experience.
· Radiographs: five sets (anteroposterior and lateral view) of distal radial fractures as selected by the senior author (DP). The objective was to ensure full representation of the spectrum of distal radial fractures. For each of the five cases the participant was asked to classify the fracture. Each classification was illustrated in words and a diagram (figure 1).
Answers were given by marking the box adjacent to the illustration that represented the correct classification according to the observer. An option was also provided to correct a score. In total, 20 responses were requested.
After approval was obtained from the other authors, the scores selected by the senior author (DP) constituted the gold standard.
Statistics
Intraobserver reproducibility for individual cases and all cases collectively was tested with Cohen Kappa statistics (Landis and Koch, 1977a,Landis and Koch, 1977b). Poor generation of only matching values of variables precluded the use of Kappa statistics in our study (Landis and Koch, 1977a,Landis and Koch, 1977b). Kappa is a measurement used for determining the level of agreement in categorical variables corrected for chance. We defined the kappa results as <0.20 poor, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 good and 0.81-1.00 very good (Landis and Koch, 1977a,Landis and Koch, 1977b).
Pearson correlation coefficients were used to test the interobserver reliability. Next, the inter-observer relationship between the classifications scored by experienced (≥6 years of clinical practice) and less experienced observers (<6 years of clinical practice) was determined. Spearman correlation coefficients were used for establishing between-groups relationships and relationships between the experienced or less experienced group and classifications based on our gold standard as established by the senior investigator (DP). Compliance with test assumptions was checked, and when no normal distribution of data was observed, either parametric (relying on the central limit theorem when appropriate) or non- parametric testing was performed. All tests were conducted two-tailed with a 0.05 level of significance.
23
Figure 1. Example of one of the five cases with one of the classifications systems (Older, yet unanswered) as
used on the internet website DiRECT.
Nondisplaced
• Loss of some volar angulation and up to 5° of dorsal angulation. • No significant shortening 2 mm or more above the distal radius. Displaced with minimal
comminution
• Loss of volar angulation or dorsal displacement of distal fragment. • Shortening usually not below the distal ulna but occasionally up to 3mm below it.
• Minimal comminution of the dorsal radius
Displaced with comminution of the dorsal radius.
• Comminution of the distal radius. • Shortening usually below the distal ulna.
• Comminution of the distal radius fragment usually not marked and often characterised by large pieces.
Displaced with severe
comminution of the radial head
• Marked comminution of the dorsal radius.
• Comminution of the distal radial fragment shattered.
• Shortening usually 2-8mm below the distal ulna.
• Poor volar cortex in some cases. Older 1
Older 2
Older 3
Older 4
Figure 1. Example of one of the five cases with one of the classifications systems (Older, yet unanswered) as
used on the internet website DiRECT.
Nondisplaced
• Loss of some volar angulation and up to 5° of dorsal angulation. • No significant shortening 2 mm or more above the distal radius. Displaced with minimal
comminution
• Loss of volar angulation or dorsal displacement of distal fragment. • Shortening usually not below the distal ulna but occasionally up to 3mm below it.
• Minimal comminution of the dorsal radius
Displaced with comminution of the dorsal radius.
• Comminution of the distal radius. • Shortening usually below the distal ulna.
• Comminution of the distal radius fragment usually not marked and often characterised by large pieces.
Displaced with severe
comminution of the radial head
• Marked comminution of the dorsal radius.
• Comminution of the distal radial fragment shattered.
• Shortening usually 2-8mm below the distal ulna.
• Poor volar cortex in some cases. Older 1
Older 2
Older 3
RESULTS
The five sets of selected radiographs were reviewed twice with a four-month interval by 45 observers. Of the 450 (45 x 5 x 2) classified fractures of the AO/ASIF, Frykman, Fernandez and Older classifications, mean correspondence with the final consensus (the gold standard) appeared in only 55 fractures for the AO/ASIF classification main group, 45 for Fernandez, 45 for Frykman and 45 for Older.
The weighted Kappa values for intraobserver reproducibility were 0.52 for the AO/ASIF classification, 0.26 for Frykman, 0.42 for Fernandez and 0.27 for Older (Table 1). These data represent an overall agreement between two measurements in time, from ‘poor’ to ‘fair’ (Fleiss J.L., 1981). There is a modest precision of measurements, resulting in rather wide reliability intervals.
Intraobserver correlation between our internet and gold standard scores lacked significance for all classifications in both measurement rounds, with only one exception for the Frykman classification in round two (Table 2).
Table 1. Weighted Kappa values for intraobserver agreement of four distal radial fracture classification systems.
Value Classification
AO/ASIF Frykman Fernandez Older
Weighted kappa 0.52 0.26 0.42 0.27
Confidence interval 0.37-0.63 Not calculable 0.20-0.58 Not calculable
Number 243 161 238 237
Table 2. Pearson correlation coefficient for intraobserver reliability of four distal radial fracture classification
systems.
AO / ASIF first measurement 0.23 0.00 500
AO / ASIF second measurement* 0.14 0.00 554
Frykman first measurement 0.13 0.00 497
Frykman second measurement* 0.03 0.43 547
Fernandez first measurement 0.28 0.00 494
Fernandes second measurement* 0.21 0.00 546
Older first measurement -0.22 0.00 492
Older second measurement* -0.19 0.00 546
Classification Values
25
Dividing the observers in two groups based on years of experience, a Spearman correlation coefficient was calculated for all five classified cases to determine interobserver reliability. This correlation was poor but statistically significant for scores between observers (Figure 2) and between observer and our gold standard (Table 3).
Table 3. Spearman rank correlation coefficient with p-value for interobserver reliability of four distal radial
fracture classification systems
Group 1: responders < 6 yr. clinical experience; group 2, responders ≥ 6 yr .clinical experience; GS, Gold standard; vs, versus.
* Second round of observations made 4 months after first round.
AO 0.10 (0.04) 0.10 (0.05) 0.08 (0.06) 0.10 (0.15) 0.12 (0.05) 0.10 (0.08)
Fernandez 0.16 (0.00) 0.16 (0.00) 0.13 (0.02) 0.14 (0.03) 0.07 (0.06) 0.10 (0.03)
Frykman 0.10 (0.05) 0.13 (0.01) 0.09 (0.09) 0.06 (0.10) 0.05 (0.31) 0.05 (0.41)
Older 0.15 (0.00) 0.20 (0.00) -0.08 (0.31) -0.72 (0.20) -0.08 (0.10) -0.11 (0.12)
Classification Group 1 vs Group 2 Round 1 Round 2*
DISCUSSION
Confirming the results of previous studies, (Andersen et al., 1991), (Kreder et al., 1996), (Illarra-mendi et al., 1998,Oskam et al., 2001) {Oskam, Kingma, et al. 2001 66 /id}our Kappa values for intraobserver agreement were also relatively low but consisted out of a full range of most popular classification systems for distal radial fractures (Table 4).
The AO/ASIF classification seems to be the most reliable; by limiting the number of subgroups, its reliability could probably be enhanced, diminishing the intended precision of the classification. The fact that we consulted clinical trauma physicians and not only specialists with a vast experience with the tested classifications (Kreder et al., 1996) could explain the apparently poor intraobserver and interobserver reproducibility of the four classifications. Still, in our opinion it reflects best on those physicians who commonly use the classification systems of communication and indication for choice of treatment.
All data indicate there is poor correlation, neither for intraobserver or interobserver reproducibility nor for level of experience, which makes using one of the four tested classifications for distal radial fractures virtually unsuitable for daily practise. This statement is also supported by literature (Kreder et al., 1996,Andersen et al., 1996,Flikkila et al., 1998,Illarramendi et al., 1998).
Certain limitations of our study are the few cases we tested by the (large) panel. We choose the number of five cases intentionally as we anticipated this to the maximum to be “tolerated” by the physicians as to completing the survey completely. But by using five cases we did not test the whole spectrum of each classification what could have a negative effect on the Kappa for interobserver reliability but should not influence intraobserver variability. Using only a moderate amount of cases could give a Kappa bias in case of a “borderline” fracture for a certain classification. Certain fracture patterns could well (dis)favour one or several of the tested classification systems. As our results indicate a fairly regular distribution of interobserver scores over each case and
Table 4. Intraobserver Kappa values reported in the literature and used classification, with numbers of
observers and cases.
DiRECT, Distal Radial fracture Electronic Classification Trial.
Andersen et al., 1996 4 55 0.57-0.70 Kreder et al., 1996 36 30 0.75 Illarramendi et al., 1998 6 200 0.57 0.61 Oskamp et al., 2001 2 124 0.56 Andersen et al., 1991c 4 185 0.75 DiRECT 45 5 0.52 0.26 0.42 0.27
Report Number Classification system
27
Figure 2. Interobserver scores for individual cases and each classification separately.
1 2 3 4 120 100 80 60 40 20 0
case 1 case 2 case 3 case 4 case 5
120 100 80 60 40 20 0 160 140 120 100 80 60 40 20 0 140 120 100 80 60 40 20 0 1 2 3 4 5 1 2 3 4 5 6 7 8 A A A B B B C C C 1 2 3 1 2 3 1 2 3 AO/ASIF classification Frykman classification Fernandez classification Older classification Responders
ACKNOWLEDGEMENT
29
REFERENCES
1. Andersen D J, Blair W F, Steyers C M, Jr., Adams B D, el Khouri G Y, Brandser E A. Classification of distal radius fractures: an analysis of interobserver reliability and intraobserver reproducibility. J Hand Surg [Am ] 1996; (21): 574-582.
2. Andersen G R, Rasmussen J B, Dahl B, Solgaard S. Older’s classification of Colles’ fractures. Good intraobserver and interobserver reproducibility in 185 cases. Acta Orthop Scand 1991; (62): 463-464. 3. Fernandez D L. Fractures of the distal radius: operative treatment. Instr Course Lect 1993; (42): 73-88. 4. Fleiss J.L. Statistical Methods for Rates and Proportions. John Wiley & Sons, New York 1981.
5. Flikkila T, Nikkola-Sihto A, Kaarela O, Paakko E, Raatikainen T. Poor interobserver reliability of AO classification of fractures of the distal radius. Additional computed tomography is of minor value. J Bone Joint Surg Br 1998; (80): 670-672.
6. Frykman G. Fracture of the distal radius including sequelae--shoulder-hand-finger syndrome, disturbance in the distal radio-ulnar joint and impairment of nerve function. A clinical and experimental study. Acta Orthop Scand 1967;Suppl. 108: 1-155
7. Illarramendi A, Gonzalez D, V, Segal E, De Carli P, Maignon G, Gallucci G. Evaluation of simplified Frykman and AO classifications of fractures of the distal radius. Assessment of interobserver and intraobserver agreement. Int Orthop 1998; (22): 111-115.
8. Johnstone D J, Radford W J, Parnell E J. Interobserver variation using the AO/ASIF classification of long bone fractures. Injury 1993; (24): 163-165.
9. Kreder H J, Hanel D P, McKee M, Jupiter J, McGillivary G, Swiontkowski M F. Consistency of AO fracture classification for the distal radius. J Bone Joint Surg Br 1996; (78): 726-731.
10. Landis J R, Koch G G. An application of hierarchical kappa-type statistics in the assessment. Biometrics 1977a; (33): 363-374.
11. Landis J R, Koch G G. The measurement of observer agreement for categorical data. Biometrics 1977b; (33): 159-174.
12. Oskam J, Kingma J, Klasen H J. Interrater reliability for the basic categories of the AO/ASIF’s system as a frame of reference for classifying distal radial fractures. Percept Mot Skills 2001; (92): 589-594.