• No results found

4 Pathologic classifi cation of diabetic nephropathy; reproducibility and validation

N/A
N/A
Protected

Academic year: 2021

Share "4 Pathologic classifi cation of diabetic nephropathy; reproducibility and validation"

Copied!
16
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)
(2)

4

Pathologic classifi cation of diabetic nephropathy; reproducibility and validation

Elisabeth J.J. Valk Celine Q.F. Klessens Olaf M. Dekkers Ron Wolterbeek Kerstin Amman Adam H. Cohen Terry H. Cook

Cinthia B. Drachenberg Franco Ferrario

Agnes B. Fogo Kensuke Joh Laure-Helene Noël Jai Radhakrishnan Surya V. Seshan Jan A. Bruijn Mark Haas

Ingeborg M. Bajema

Antien L. Mooyaart

Submitted

(3)

absTraCT

In 2010, a pathological classification for diabetic nephropathy (DN) was developed.

Most validation studies of this classification showed a significant association with renal outcome, but either compared different combinations of classes with outcome or had insufficient power. An adequate reproducibility study of this classification system is missing. Therefore, we performed a reproducibility study together with a meta-analysis study for all validation studies to better estimate the prognostic role of the different DN classes.

In order to asses agreement for reproducibility, a DVD with 13 digitized biopsies of DN patients was sent to all members of the Renal Pathology Society. The interobserver agreement was determined by intraclass correlation values (ICC). Additionally, data was extracted from the validation studies and a meta-analysis for the different DN classes was performed.

The ICC for DN class was 0.74. Additional parameters had the following ICCs: IFTA: 0.72;

interstitial inflammation: 0.47; arteriolar hyalinosis: 0.41 and arteriosclerosis: 0.43. ICCs of all pathologists did not differ substantially from ICCs of pathologists with a high level of expertise. The meta-analysis showed an increased risk for a poor renal outcome for each of the classes IIb, III and IV compared to class IIa (p<0.0001).

This study shows that the DN classification is suitable for clinical practice. Our meta- analysis data show a good relation with renal prognosis over the classes of DN as cur- rently defined. Furthermore, the DN classification has relatively good reproducibility although improvements could be made by fine-tuning definitions.

(4)

4

inTroDuCTion

Pathological classification systems exist for several renal diseases such as lupus nephritis [1], IgA nephropathy [2] and ANCA-associated glomerulonephritis [3]. In 2010, a clas- sification system was developed for diabetic nephropathy (DN) [4]. This classification is primarily based on glomerular damage whereas interstitial and vascular lesions are scored separately. The classification system can be used for DN in both type 1 and type 2 diabetes mellitus patients.

In general, classification systems are created to provide better communication between pathologists and clinicians, and the possibility to link diagnostic information together with prognostic indication in order to guide therapeutic decision making. By means of subsequent studies focusing on the clinical validation and interobserver agreement of classification systems, the systems are continuously modified and improved.

For the pathological classification of DN, several validation studies of the DN classes showed an association with renal outcome. However, all these studies combined classes, and moreover, combined them differently to have adequate power to find an association with outcome [5-9]. The reproducibility of the DN classification was investi- gated in the original article and one validation study, but these reproducibility studies were relatively small [4, 8].

Therefore, the aim of the present study was to perform an adequate reproducibility study by means of a survey of members of the Renal Pathology Society (RPS). Based on the data obtained, we here provide suggestions for clarifications of some of the definitions. Furthermore, we performed a meta-analysis study for all validation stud- ies published to date, to adequately estimate the prognostic role of the different DN classes.

meThoDs

Reproducibility study; Case selection and survey

For this study 13 biopsies of DN patients were selected from the archives at Leiden University Medical Center. The renal damage within these cases was exclusively attribu- table to DN. The representative cases contained the full spectrum of lesions which can be found in DN. For each case high quality Periodic Acid-Schiff (PAS) and silver stains were provided. The biopsies were handled, coded and anonymized according to the Dutch National Ethical guidelines.

(5)

In order to assess interobserver agreement, slides were digitalized and saved as an HTML-document on a DVD. This DVD was sent to all 360 members of the RPS with an invitation to participate in the project. Additionally, an Excel response sheet with a flowchart of the classification, instructions, and the original article of the classification were provided on the DVD [4]. The participants were asked to add remarks when diffi- culties occurred while scoring. The participating pathologists were requested to assess their level of expertise by which they were divided into the following categories: all pathologists; the subgroup of pathologists with a high level of expertise; the subgroup of pathologists with low or moderate expertise and those who did not mention their level of expertise by self-assessment.

Pathological classification system of DN

All cases were categorized based on the pathological classification of DN as previously described. For the exact details of the DN classification, we refer to the original paper [4]. In brief, the classification encompasses 4 classes: class I, glomerular basement thick- ening by electron microscopy without specific light microscopic changes; class IIa, mild mesangial expansion (mild mesangial expansion in >25% of the observed mesangium);

class IIb, severe mesangial expansion (severe mesangial expansion in >25% of the ob- served mesangium); class III, at least one lesion with nodular sclerosis; class IV global glomerulosclerosis in > 50% of glomeruli.

Interstitial fibrosis and tubular atrophy (IFTA) and interstitial inflammation are scored on a semiquantitative scale. Arteriolar hyalinosis is scored 0 when it was absent; 1, if at least one arteriole with hyalinosis is present and 2, if more than one arteriole with hyalinosis is observed in the entire biopsy. Arteriosclerosis is scored as follows: 0 for no intimal thickening, 1 for intimal thickening less than the thickness of the media, and 2 for intimal thickening more than the thickness of the media.

Eligibility criteria, data selection and extraction of validation studies To provide an overview of clinical outcome studies in relation to the pathological classifica- tion of DN, all articles citing the original manuscript were collected via a search on Web of Science and Google Scholar. To ensure maximum sensitivity, no limits or filters were used in the searches. Language restrictions were not included in the initial search. This search was performed by a trained librarian in April 2016. Two observers (CK and LV) independently reviewed all studies to include all validation studies of the pathological classification of DN, which associated DN class with renal outcome. Furthermore, an expert of the field (IB) was consulted to ensure that all validation studies investigating DN class were included.

Included were studies investigating type 1 and 2 diabetes correlating DN class with renal outcome. Renal outcome was defined as end stage renal disease (ESRD) or doubling of se-

(6)

4

rum creatinine. Studies were excluded which only investigated interstitial lesions or which used the same cohort in relation to different clinical outcome parameters. In the latter case, we included the first published validation study in our meta-analysis. Data were extracted if possible. Preferably hazard ratios were extracted with confidence intervals, otherwise these data were extrapolated from Kaplan-Meier curves or data on absolute risks.

Statistical analysis

For all the parameters in this study a reliability analysis was conducted by calculating the intraclass correlation coefficient (ICC) (0=no agreement, 1=perfect agreement). In this calculation, the given answers were compared between participants rather than comparing the answers with a ‘gold standard’. ICCs were calculated using a mixed model to estimate the variance components of the ICC. An ICC of >0.75 was considered to show excellent reproducibility, ICC of 0.4 to 0.75 to indicate fair to good reproducibility, and ICC of < 0.4 to indicate poor reproducibility [10]. These analyses were performed using SPSS statistics 20.0 (IBM, Armonk, NY). For the meta-analysis of the validation study a generic-invariance method was used in a random effects model and analyses were performed in ReviewManager (RevMan) version 5.3. Preferably hazard ratios from the individual studies were used with class IIa as a reference group. If these were not available, relative risks and standard errors were calculated with the available data.

Heterogeneity within the studies was estimated by the I2, which is the percentage of the total variation across studies due to heterogeneity rather than chance. An I2 of 25%, 50% or 75% was considered low, moderate or high, respectively.

resuLTs

Reproducibility study

A total of 13 biopsies with lesions attributable to DN were scored by 77 pathologists from 28 different countries, of which 38 (49%) had a self-assessed high level of expertise, 19 (25%) had a moderate level, 3 (4%) had a low level and 21 (22%) did not mention their level of expertise. The response rate of the reproducibility study was 21.4% (77/360).

Table 1. Intraclass correlation (ICC) of pathologic classification of DN

iCC score all pathologists Pathologists with high expertise level

DN class 0.74 0.76

IFTA 0.72 0.73

Interstitial inflammation 0.47 0.57

Arteriolar Hyalinosis 0.41 0.43

Arteriosclerosis 0.43 0.44

ICCs of all pathologists did not differ substantially from ICCs of pathologists with a high level of expertise

(7)

Table 1 gives an overview of intraclass correlations (ICC) of the lesions scored by all participating pathologists and the ICC of a subgroup of pathologists with a high level of expertise. ICCs of all pathologists did not differ substantially from ICCs of pathologists with a high level of expertise The ICC score for glomerular lesions (i.e. DN class) amongst all pathologists was 0.74; amongst pathologists with high expertise it was 0.76. The best concordance was found in class III and IV; the least concordance was found in class I and II, and in the subdivision of class II.

IFTA had an ICC score of 0.72. Severe IFTA cases had more concordance compared to cases with mild IFTA involvement. Regarding the vascular lesions, arteriolar hyalinosis had an overall reproducibility of 0.41. Arteriosclerosis had an overall interobserver agreement of 0.43. The participating members of the RPS could provide separate remarks in addition to the scoring system. Table 2 provides an overview of certain remarks, followed by our recommendations to clarify these definitions.

Meta-analysis on validation studies

The initial search found 258 studies, of which 240 studies were excluded because these were case reports, reviews, or studies which used the classification system in an ex- perimental setting, but were not validation studies. Finally, 12 studies were regarded as possible validation studies, however, some only investigated IFTA with renal outcome or the same cohort was used in more than one study. Therefore, in the end, 4 validation studies were included in the meta-analysis (Figure 1). The basic characteristics of these studies are provided in Table 3. In the study of Mise et al. and Okada et al. data including the hazard ratios with confidence interval (of which the standard error could be calcu- lated) were available. In the study by Okada et al. no data for class I were available. In the study by Oh et al. the described absolute risks were used to calculate an relative risk and standard error. In the study by An et al. there was a hazard ratio and confidence interval available for all classes together. The hazard ratio was extrapolated for all independent classes from this hazard ratio and the standard error was calculated from the relative risk which was extrapolated from the Kaplan-Meier curves.

(8)

4

Table 2. Recommendations survey for DN classification

remarks recommendations

Glomerular lesions

DN class Dividing class I and II:

definition and cut-off points criticized as difficult

Class III: only one nodule in cases with overall mild mesangial expansion

Define more straightforward definitions for mesangial alterations and examine these in future validation studies.

The formation of nodular sclerosis may be a specific trait of some patients with DN, who are not yet more distinctly defined Therefore, recognition of one nodule seems appropriate to designate a specific class of DN. Results from the meta-analysis indicate specific outcome for this class.

Interstitial lesions IFTA

Interstitial inflammation

IFTA is a good predictor for renal outcome, but can be the result of other renal diseases The relatively low ICC of interstitial inflammation

Take the severity of IFTA into account and specifically note in all biopsy reports

Clarification of this definition: only score inflammation in areas without IFTA.

Determine the relative effects of interstitial inflammation in non-scarred areas versus total interstitial inflammation on the prognosis of DN in further studies.

Vascular lesions Arteriolar hyalinosis

Unclear in which vessel type arteriolar hyalinosis needs to be scored

Just identify and mention hyalinosis if present;

most of the hyalinosis will occur in arterioles.

Arteriosclerosis What is the definition of a large vessel

Eliminate the vessel size from the definition, and to focus only on the presence of intimal thickening/fibrosis in vessels which are larger than arterioles

Based on the remarks obtained from the reproducibility study, recommendations of each parameter were pro- posed to improve the use of the DN classification

(9)

Figure 2 shows the results of our meta-analyses of the validation studies. The validation studies resulted in a pooled hazard ratio of class I versus IIa of 0.49 (95% C.I. 0.13-1.90, p=0.30). The pooled hazard ratio of class IIa versus class IIb was 2.96 (95% C.I. 1.82-6.05, p<0.00001), showing an increased risk for developing a poor renal outcome in patients with class IIb compared to IIa. For class IIa versus class III a signifi cant diff erence also was seen (p<0.00001) with a pooled hazard ratio of 5.26 (95% C.I. 2.75-10.04, p<0.00001), showing that patients with class III have an increased risk of a poor renal outcome compared to class IIa. Finally, class IV versus IIa showed a poorer renal prognosis for class IV, hazard ratio 11.23 (95% C.I. 4.56-27.68, p<0.00001). There is low to moderate heterogeneity between the groups (class IIa vs I, I2= 0%, class IIa vs IIb, I2= 0%, class IIa vs III, I2= 49% and class IIa vs IV, I2= 67%).

figure 1. Flowchart illustrating how the validation studies of the DN classifi cation were selected for the meta-analysis

(10)

4

Table 3. Overview of validation studies using the histopathological classification of DN studyDiabetes type number of patientsethnicityPatient characteristicsfollow uprenal outcome Okada et al. 2011 Type 2 diabetesN= 69Japanese Patients with overt proteinuria and biopsy-confirmed diabetic nephropathy with mesangial expansion Mean and median follow-up duration was 59 +/-41 and 52 months (range 6 – 180 months)

Chronic dialysis or doubling of serum creatinine Oh et al. 2012Type 2 diabetesN= 126Korean 50 patients with pure DN, 65 with non diabetic renal disease and 11 mixed. Only 50 pure DN patients scored following the DN classification

Follow-up during 69.2 +/- 35.2 (0.4–137.6) months after renal biopsy to detect ESRD

End-stage renal disease Mise et al. 2014Diabetes mellitusN= 205Japanese Patients with renal biopsy and inclusion study criteria (eGFR>10mL/ min/1.73m2 , biopsy>10 glomeruli)

The mean follow-up period was 62.9 ± 68.3 monthsRenal death defined as dialysis by end- stage renal disease An et al. 2014Type 2 diabetesN= 396Chinese Patients with biopsy proven DN At least one year follow upRenal outcome defined as progression to end- stage renal disease or doubling of serum creatinine

(11)

figure 2. Meta-analysis of DN classes of the performed validation studies at 80 month follow-up IIa v I

IIa vs IIb

IIa vs III

IIa vs IV

(12)

4

DisCussion

In 2010, we launched the histopathological classification for diabetic nephropathy, which since then has been used in multiple research and diagnostic settings. In general practice, nephropathologists worldwide make use of the classification, but fine-tuning of some of the definitions of lesions may be appropriate. Therefore, a survey was launched through the RPS to obtain insight into interobserver agreement amongst nephropathologists worldwide, and issues related to day-to-day practical issues. With regards to definitions, we here summarize the comments from RPS members who joined in our survey. Whereas reproducibility was proven to be sufficient for classes III and IV, disagreement among observers was noticed for class I, IIa and IIb. Although many studies citing the classification are present in the literature, a careful investigation showed that actually only a limited number of validation studies had been conducted, i.e. those with a specific aim to validate the classes of the DN classification. All these studies came from Asia. We also performed a meta-analysis of these studies, showing that the DN classification has a correlation with renal outcome for most classes using class IIa as a reference, which underscores the clinical usefulness of the classification sys- tem. For class I versus IIa there was no significant difference found in this meta-analysis.

This is likely explained by the fact that the available studies were underpowered for this comparison as only a small number of patients were investigated with only few events (especially in class I).

Because the meta-analysis showed a good association with renal outcome for most classes, it seems appropriate to maintain the subdivision as previously described.

However, more straightforward definitions for mesangial alterations to distinguish between class I, IIa and IIb are called for, but need to be examined in future validation studies using modifications of the original definitions that would be tested for both interobserver agreement and correlation with clinical outcomes. For the moment, no clear-cut suggestions came out of the survey for an intermediate solution.

With respect to classes III and IV, there was good interobserver agreement in the recog- nition of these classes and comments from participants were few. There was a concern from some participants whether the presence of one nodule was sufficient to classify a sample as class III DN, especially in cases with overall mild mesangial expansion. We postulate that the formation of nodular sclerosis may be a specific trait of some patients with DN, who are not yet more distinctly defined [11]. Therefore, recognition of one nodule still seems appropriate to designate a specific class of DN (i.e. class III), and results from the meta-analysis indicate a specific outcome for this class by its current definition.

(13)

Our reproducibility study showed a high ICC for IFTA, similar to findings in the original Oxford IgA nephropathy study [2]. A common remark from survey participants was that IFTA could have been the result of other renal diseases. The point about IFTA being a good predictor for renal outcome despite its nonspecific appearance in virtually all renal diseases has been frequently raised. IFTA is not a primary parameter of the DN classification, but several validation studies on IFTA showed that IFTA has impact on renal prognosis in diabetic nephropathy [5, 7-9, 12]. Because of the prognostic value of IFTA it is certainly useful to take the severity of IFTA into account during evaluation of the biopsy, and the amount of IFTA should be specifically noted in all renal biopsy reports in cases of DN. The ICC of interstitial inflammation was remarkably low. This could be the result of lack of clarity in the definition of whether or not to score inflammation in areas with or without IFTA. In concordance with other guidelines for scoring inflammation in renal diseases, it would seem appropriate also in DN only to score inflammation in areas without IFTA. Future studies need to clarify the definition of interstitial inflammation and should determine the relative effects of interstitial inflammation in non-scarred areas versus total interstitial inflammation on the prognosis of diabetic nephropathy.

According to the original manuscript of the DN classification, arteriosclerosis needs to be scored in large vessels. However, the definition of a large vessel was perceived as unclear. A straightforward and simple solution would be to eliminate the vessel size from the definition, and to focus only on the presence of intimal thickening/fibrosis in vessels which are larger than arterioles. In addition, the vessel type and/or size in which arteriolar hyalinosis needs to be scored was also not evident. A straightforward solution here would be just to identify and mention hyalinosis if present – given that the lesion is relatively easy to recognize in a PAS-staining – and most of the hyalinosis lesions will occur in arterioles.

In the present study we reflect on problematic issues of the DN classification by us- ing a two-way approach, namely through a survey of renal pathologists from the RPS and by reviewing the literature by means of a meta-analysis of validation studies so far performed. Each of these routes has its own limitations. The survey approach may have suffered from a bias in response due to factors outside our control, because RPS members could become involved in this part of the study by their own initiative. Nev- ertheless, participants in this study on diabetic nephropathy appeared to reflect the RPS membership relatively well with regards to the participants’ distribution among different countries and with respect to the range of experience among our partici- pants. During the evaluation of the validation studies included in our meta-analysis, we observed some limitations of these studies that merit discussion. The performed valida- tion studies had different study designs and all glomerular classes were not included in

(14)

4

all studies. For example, Okada et al. did not include class I in their study [7]. Some of the studies, especially by Oh et al. and Okada et al. were very small. Furthermore, all studies were performed in Asian populations, and therefore it could be debated if these data can be fully extrapolated to other populations.

This study shows that the pathologic classification of DN has relatively good reproducibil ity but improvements can be made by fine-tuning definitions. Our meta- analysis data showed a good relation with renal prognosis over the classes of DN as currently defined. On the basis of results from a two-way approach into reproducibility and prognostic value of the classification, we have listed the current issues with recom- mendations here. In an international workgroup on DN we are currently working on the modifications of the DN classification.

aCknoWLeDgemenTs

We very much appreciate the support of the RPS by providing us the opportunity to perform the reproducibility study. We thank all participating members of the RPS for their contribution to the study. Additionally, we would like to thank our librarian Jan Schoones for his excellent assistance in search of all articles which cited the classifica- tion manuscript, enabling us to investigate the number of validation studies performed concerning the DN classification.

(15)

referenCes

1. Weening JJ, D’Agati VD, Schwartz MM, et al. The classification of glomerulonephritis in systemic lupus erythematosus revisited. Kidney Int, 2004; 65: 521-30

2. Working Group of the International IgA Nephropathy Network and the Renal Pathology Society. The Oxford classification of IgA nephropathy: pathology definitions, correlations, and reproducibility.

Kidney Int, 2009; 76: 546-56

3. Berden AE, Ferrario F, Hagen EC, et al. Histopathologic classification of ANCA-associated glomerulone- phritis. J Am Soc Nephrol, 2010; 21: 1628-36

4. Tervaert TW, Mooyaart AL, Amann K, et al. Pathologic classification of diabetic nephropathy. J Am Soc Nephrol, 2010; 21: 556-63

5. Mise K, Hoshino J, Ubara Y, et al. Renal prognosis a long time after renal biopsy on patients with diabetic nephropathy. Nephrol Dial Transplant, 2014; 29: 109-18

6. Oh SW, Kim S, Na KY, et al. Clinical implications of pathologic diagnosis and classification for diabetic nephropathy. Diabetes Res Clin Pract, 2012; 97: 418-24

7. Okada T, Nagao T, Matsumoto H, et al. Histological predictors for renal prognosis in diabetic ne- phropathy in diabetes mellitus type 2 patients with overt proteinuria. Nephrology (Carlton), 2012; 17:

68-75

8. An Y, Xu F, Le W, et al. Renal histologic changes and the outcome in patients with diabetic nephropa- thy. Nephrol Dial Transplant, 2015; 30: 257-66

9. Zhu X, Xiong X, Yuan S, et al. Validation of the interstitial fibrosis and tubular atrophy on the new pathological classification in patients with diabetic nephropathy: A single-center study in China. J Diabetes Complications, 2016; 30: 537-41

10. Fleiss JL and Cohen J. Equivalence of Weighted Kappa and Intraclass Correlation Coefficient as Mea- sures of Reliability. Educational and Psychological Measurement, 1973; 33: 613-619

11. Schwartz MM, Lewis EJ, Leonard-Martin T, et al. Renal pathology patterns in type II diabetes mellitus:

relationship with retinopathy. The Collaborative Study Group. Nephrol Dial Transplant, 1998; 13: 2547- 52

12. Shimizu M, Furuichi K, Toyama T, et al. Long-term outcomes of Japanese type 2 diabetic patients with biopsy-proven diabetic nephropathy. Diabetes Care, 2013; 36: 3655-62

(16)

Referenties

GERELATEERDE DOCUMENTEN

De palen met daartussen gebundelde riet geven een betere bescherming tegen afkalven van de oever, dan het type met alleen een cocosmat. Het is pas over een jaar goed te zien of

show high number of zeros.. Figure D2: Total honeybee colony strength characteristics in the six sites in the Mwingi study region, Kenya estimated using Liebefeld methods: a)

De ligging van de polen kan worden ge- varieerd door bepaalde parameters, in casu ontwerpparameters, in waarde continu te veranderen. Het effect van deze variaties blijlc in

Our aim was to gauge the typical effect size of being ostracized in the Cyberball game and to see whether this effect is moderated by cross-cutting variables that were hypothesized

Reitsma and others [12] proposed the direct analysis of sensitivity and specificity estimates using a bivariate model BM, which yields a rigorous method for the meta- analysis of

We compared nephropathy prevalence be- tween two groups of first-degree relatives of Indo- Asian patients with Type 2 diabetes; the first group (case relatives) consisted of

[r]

In fact, our results provide evidence that different prefrontal regions were active in distinct stages of insight: the right IFG was activated in the incubation-like stage and