• No results found

Can chemical structure predict reproductive toxicity?

N/A
N/A
Protected

Academic year: 2021

Share "Can chemical structure predict reproductive toxicity?"

Copied!
76
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Contact

Lidka Maślankiewicz

RIVM/Expert Centre for Substances (SEC) e-mail: lidka.maslankiewicz@rivm.nl

RIVM report 601200005/2005

Can chemical structure predict reproductive toxicity?

L. Maślankiewicz, E.M. Hulzebos, T.G. Vermeire, J.J.A. Müller and A.H. Piersma

This investigation has been performed by order and for the account of Directorate General for Environmental Protection, and Chemicals/Waste/Radiation Protection Directorate, within the framework of project 601200 Beoordelingsinstrumentarium: deelproject QSARs

(2)
(3)

Abstract

Can chemical structure predict reproductive toxicity?

Structure-Activity Relationships (SARs), including Quantitative SARs, are applied to the hazard assessment of chemicals. This need is all the more urgent considering the proposed new EU policy on chemicals in REACH, which stresses the need for non-animal testing. DEREKfW and the TSCA Chemical Category List of the New Chemicals Program of the US-EPA were chosen to predict reproductive toxicity for REACH purposes. DEREKfW is a software program predicting the toxicological properties using the literature and expert-derived structural alerts, while the TSCA Chemical Category List is a document and is based on expert judgment and a category approach. We screened the performance of both models on recognizing substances that are classified for reproductive toxicity in the EU (based on experimental animal tests). As we limited our research to only reproductive positive examples the rate of false positives could not be assessed. DEREKfW partially fulfils the OECD principles for good (Q)SARs. DEREKfW and TSCA Chemical Category List did not recognize 90 and 77% of the substances classified for ‘impaired fertility’, and 81 and 82% of the substances classified for ‘harm to the unborn child’, respectively. Besides one mutual ‘alert/category’ DEREKfW and TSCA contain 7 ‘alerts’ and 10 categories, respectively. As the alerts in DEREKfW (comprehensible and transparent tool) and categories in the TSCA Chemical Category List are highlighted in this research, they both can be used as additional expert judgment when assessing chemicals for reproductive toxicity. However, we conclude that these models cannot be the only method for screening chemicals for reproductive toxicity in the framework of REACH. Other models or testing strategies have to be used to assess reproductive toxicity of chemicals.

(4)

Rapport in het kort

Kan chemische structuur reproductie toxiciteit voorspellen?

Structuur-activiteitsrelaties (SARs), inclusief kwantitatieve SARs worden gebruikt in de risicobeoordeling van stoffen. Deze noodzaak is des te meer urgent gezien het voorgestelde EU beleid voor stoffen, REACH, die de vermindering van dierproeven benadrukt. DERKfW en TSCA Chemische Categorieën Lijst zijn gekozen om reproductie toxiciteit voor REACH doeleinden te voorspellen. DEREKfW is een software programma dat toxicologische

eigenschappen voorspelt gebruik makend van op literatuur en ‘expert judgement’ gebaseerde ‘structural alerts’, terwijl de TSCA Nieuwe Stoffen Programma Lijst van de US-EPA is gebaseerd op expert judgement en chemische categorieën. We hebben de twee modellen gescreened op het herkennen van stoffen die geclassificeerd zijn voor reprotoxiciteit in de EU (gebaseerd op experimentele dierstudies). De mate van vals positieven kon hierdoor niet bepaald worden. DEREKfW en de TSCA Chemische Categorieen Lijst herkenden 90 en 77% van de stoffen niet met een ‘verminderde fertiliteit classificatie en 81 and 82% van de stoffen niet met een ‘schade aan het ongeboren kind’ classificatie, respectievelijk. Afgezien van één gezamenlijke ‘alert’ hebben DEREKfW (een helder model) and de TSCA chemische

Categorieën Lijst nog 7 ‘alerts’ en 10 categorieën, respectievelijk. Doordat de alerts in DEREKfW and TSCA Categorieën Lijst in dit onderzoek naar voren komen, kunnen deze gebruikt worden als additionele ‘expert judgement’. We concluderen echter dat deze

modellen niet de enige methode kunnen zijn voor het screenen van stoffen voor reproductie toxiciteit in het kader van REACH. Andere modellen en teststrategieën zijn nodig om reproductie toxiciteit van stoffen te beoordelen.

(5)

Preface

The authors thank Edith Laraway and Liz Covey-Crump of LHASA for critically reading the report and for the appropriate suggestions. The authors also like to thank LHASA for their permission presenting of the DEREKfW predictions of the EU reproductive classified chemicals. The authors also like to thank Betty Hakkert for her elucidating remarks.

(6)
(7)

Contents

Summary 9

Samenvatting 11

1. Introduction 13

1.1 Reproductive toxicity and REACH 13

1.2 Objectives 13

1.3 Structure of the report 14

2. Methods 15

2.1 Annex I, Annex VI and TGD 15

2.2 Definitions and terminology for the assessment of the model outcome 16

2.2.1 Definitions 16

2.2.2 Terminology for the assessment of the model outcome 16

2.3 The test set: Annex I substances 17

2.4 Classification and labelling for reproductive toxicity 17

2.5 DEREKfW – knowledge based system 19

2.5.1 General information on DEREKfW 19

2.5.2 Predicting part and Editor part 20

2.5.3 Likelihood levels in DEREKfW 21

2.5.4 DEREKfW alerts for reproductive toxicity 22

2.6 TSCA Chemical Categories List 22

TSCA chemical category for reproductive toxicity 22

2.7 DEREKfW – Performance procedure 23

2.7.1 Technical details of testing DEREKfW 23

2.7.2 Comparison DEREKfW versus Annex VI definitions 23

2.7.3 Interpretation of DEREKfW predictions 24

2.8 OECD (Q)SAR principles for (Q)SAR computer models 24

2.9 The TSCA Chemical Category List, performance procedure 25

3. Results 27

3.1 DEREKfW versus OECD (Q)SAR principles 27

3.2 Performance of DEREKfW and the TSCA Chemical Category List 32

3.2.1 The test set 32

3.2.2 Performance of DEREKfW 33

3.2.3 Performance of TSCA Chemical Categories 36

3.2.4 Comparison of structural alerts from the different (Q)SAR sources. 38

4. Discussion and recommendations 41

4.1 OECD (Q)SAR principles 41

4.2 Comparing positive SARs with positive reproductive experimental data 43 4.3 Additional comments concerning DEREKfW and TSCA Chemical Category List 45

(8)

4.5 Recommendations 45

4.5.1 Recommendations for further work 46

4.6 Final conclusions 46

References 47

Appendix 1 Reproductive classified chemicals 49

Appendix 2 Structure alerts present in DEREK 52

(9)

Summary

Structure-Activity Relationships (SARs), including Quantitative SARs, are applied to the assessment of chemicals. If the use of (Q)SARs here is to be enhanced, some questions will need to be answered concerning their validity. This need is all the more urgent considering the proposed new EU policy on chemicals in REACH, the new policy on chemicals, which stresses the need for non-animal testing (Registration, Evaluation and Authorisation of

Chemicals).

Our aim was to investigate the validity of SAR tools for predicting reproductive toxicity. Two types of SAR models were chosen for the investigation, DEREKfW and the TSCA List of the New Chemicals Program. DEREKfW (Deductive Estimation of Risk from Existing

Knowledge for Windows) is a software program to predict the toxicological properties using

the literature and expert-derived structural alerts, while the TSCA (Toxic Substances Control

Act) New Chemicals Program List of the US-EPA is based on expert judgment and a

category approach. Both models are analysed and their predicting structure and content are described in detail in this report.

The compliance of DEREKfW with the so-called OECD (Q)SAR principles, a set of principles developed by the Ad hoc Expert Group on (Q)SARs, was investigated. A model which is intended to be used for regulatory purposes should fulfil the following criteria: 1) a model should be associated with a defined endpoint which it serves to predict;

2) take the form of an unambiguous and easily applicable algorithm for predicting a pharmacotoxicological endpoint;

3) ideally, have a clear mechanistic basis;

4) be accompanied by a definition of the domain of its applicability; 5) be associated with a measure of its goodness-of-fit (internal validation);

6) be assessed in terms of its predictive power by using data that were not used in the development of the model (external validation).

We screened the performance of the reproductive toxicity SARs in DEREKfW en the TSCA Chemical Category List on recognising substances classified for reprotoxicity within the scope of Directive 67/548/EC. This set of reprotoxic substances is listed, including all structural formulas, CAS numbers and classifications. As we limited our research to reproductive positive examples the rate of false positives could not be assessed.

Although OECD QSAR principles were not completely straight forward, it was concluded, that DEREKfW partially met these principles. For the 1st principle (defined endpoint) the endpoints were indeed established, and the details are available in the references of the alert description. However, the explanatory definitions of the endpoints were missing in the model and in the manual. The second principle (clear descriptors and hierarchy) is mostly fulfilled for the reproductive endpoint as the DEREKfW contains substituents specifying the

requirements of the structural formula, which is true for most alerts for the reproductive endpoint. Principle 3 (clear mechanistic basis) is partly fulfilled, as structural alerts and corresponding references are available. The description of the mechanistic basis is given for most alerts, but at times insufficiently.

The 4th principle (defined domain of applicability) was fulfilled to a certain extent. The influence of other activating and deactivating substituents, not directly associated with the structural alert is lacking. The 5th principle (internal validation, training set available) was

(10)

considered to be fulfilled to a limited extent as the training set is limitedly available in the references and only a number of positive examples are underpinning the predictions.

Principle six (external validation), explained as ‘model has been tested with data not used for developing the model’, is fulfilled, as published data on testing DEREKfW with sets of ‘external’ substances are available.

DEREKfW did not recognize 90% of the substances classified for ‘impaired fertility’ and 79% of the substances classified for ‘harm to the unborn child’. The TSCA Chemical

Category List missed 77% and 82% of the cases, respectively. These values might still be too positive as the training set was partially unknown. Most chemicals were not detected by either method, due to the limited number of structural alerts available and the complex mechanisms of reproductive toxicity. Besides one mutual ‘alert/category’ DEREKfW and TSCA contain 7 ‘alerts’ and 10 categories, respectively. As the alerts in DEREKfW (comprehensible and transparent tool) and categories in the TSCA Chemical Category List are highlighted in this research, they both can be used as additional expert judgment when assessing chemicals for reproductive toxicity. However, we conclude that these models cannot be the only method for screening chemicals for reproductive toxicity in the framework of REACH. Other models or testing strategies have to be used to assess reproductive toxicity of chemicals.

At present, there is no complete collection of SARs for the reproductive endpoint. Reproductive toxicology is very complex and has several different and usually unknown mechanisms. Knowledge about this process is limited and SARs are difficult to define. An attempt to collect and describe the published structural alerts for the reproductive toxicity can be found in Hulzebos et al. (1999, 2001). It would be worthwhile to explore this area further. The predictions of TOPKAT and Multicase for the same performance set will be reported next year in co-operation with the Danish EPA.

(11)

Samenvatting

Structuur-activiteitsrelaties (SARs), inclusief kwantitatieve SARs worden gebruikt in de risicobeoordeling van stoffen. Als het gebruik van (Q)SAR toeneemt in de stoffen

beoordeling is het nodig om vragen over hun validiteit aan te geven. Deze noodzaak is des te meer urgent gezien het voorgestelde EU beleid voor stoffen, REACH (Registration,

Evaluation and Authorisation of Chemicals), die de vermindering van dierproeven benadrukt.

Ons doel is om de validiteit van SARs voor het voorspellen van reprotoxiciteit te bepalen. Voor dit onderzoek zijn twee modellen geselecteerd: DEREKfW and the TSCA Chemische Categorieën Lijst. DEREKfW (Deductive Estimation of Risk from Existing Knowledge for

Windows) is een software programma dat gebruik maakt van op literatuur en ‘expert

judgment’ gebaseerde ‘structural alerts’. US Environmental Protection Agency (US-EPA ontwikkelt de TSCA (Toxic Substances Control Act) List voor het ‘New Chemicals Program’. Het doel van de lijst is om stoffen te herkennen en te groeperen met

gemeenschappelijke chemische en toxicologische eigenschappen. Beide modellen zijn geanalyseerd op opbouw en inhoud.

De overeenstemming met DEREKfW met de zogenoemde ‘OECD (Q)SAR principles’ is onderzocht. Deze principes betekenen dat (Q)SAR modellen:

1) een gedefinieerd eindpunt moeten voorspellen;

2) een ondubbelzinnige en eenvoudig toepasbare algoritme moeten hebben om de voorspelling te doen;

3) idealiter, een mechanistische basis moeten hebben; 4) het bereik van QSAR methode moeten aangeven;

5) geëvalueerd moeten zijn met een ‘goodness-of-fit (interne validatie);

6) geëvalueerd moeten zijn in termen van voorspellingskracht door data te gebruiken die niet gebruikt zijn bij de ontwikkeling van het model (externe validatie).

We hebben de reprotoxische SARs in DEREKfW en the TSCA Chemische Categorieën Lijst gesreened op het herkennen van industriële stoffen die geclassificeerd zijn voor

reprotoxiciteit in het kader van de Directive 67/548/EC. Aangezien we ons onderzoek beperkt hebben tot alleen positieve reprotoxische stoffen hebben getest kunnen we niets zeggen over het aantal vals positieven die de modellen geven.

DEREKfW vervult de OECD (Q)SAR ‘principles’ gedeeltelijk. Voor het eerste principe (gedefinieerd eindpunt), zijn de eindpunten inderdaad vastgesteld in de referenties, maar het eindpunt is verder niet beschreven in het handboek of het model. Het tweede principe (duidelijke descriptoren en hiërarchie) waarin de descriptoren gebruikt kunnen worden, is meestal wel vervuld, omdat DEREKfW substituenten beschrijft die aan de ‘structural alert’gehecht mogen zijn. Het derde principe is voor een deel vervuld, omdat ‘structural alerts’ en referenties aanwezig zijn. Het mechanisme is beschreven voor het grootste deel van de alerts maar is soms onvoldoende Het vierde principe, dat het domein beschrijft van de (Q)SAR, is gedeeltelijk beschreven. De invloed van andere actieve substituenten, niet direct gehecht aan de ‘structural alert’ is niet gegeven. Principe vijf (interne validatie, training set beschikbaar) is beperkt aanwezig. De trainingset is als zodanig niet beschreven, maar is weergegeven in de genoemde referenties. Vaak zijn voorbeelden beschikbaar van stoffen met een vergelijkbare ‘structural alert’ die positief getest is in een experimentele dier studie. Aan

(12)

principe zes (externe validatie), uitgelegd als getest met een onafhankelijke test set is voldaan.

DEREKfW herkenden 90% van de stoffen niet met een ‘verminderde fertiliteit classificatie en 81% van de stoffen niet met een ‘schade aan het ongeboren kind’ classificatie. De TSCA Chemische Categorieën Lijst mist 77 en 82% van de stoffen met deze classificaties,

respectievelijk. Deze getallen zijn mogelijk nog overschat aangezien de training set voor een deel niet bekend was. De meeste stoffen zijn niet herkend door beide methoden, door het beperkte aantal alert/categorieën en het complexe mechanisme van reproductie toxiciteit. Afgezien van één gezamenlijke ‘alert’ hebben DEREKfW and de TSCA chemische Categorieën Lijst nog 7 ‘alerts’ en 10 categorieën, respectievelijk. Doordat alerts in DEREKfW (een helder model) and TSCA Categorieën Lijst in dit onderzoek naar voren komen kunnen deze gebruikt worden als additionele ‘expert judgement’. We concluderen echter dat deze modellen niet de enige methode kunnen zijn voor het screenen van stoffen voor reproductie toxiciteit in het kader van REACH. Andere modellen en teststrategieën zijn nodig om reproductie toxiciteit van stoffen te beoordelen.

Momenteel is geen complete verzameling van SARs beschikbaar voor het reprotoxische eindpunt. Reproductie toxicologie is erg complex en kent vele verschillen en onbekende mechanismen. Kennis op dit gebied is beperkt en SARs zijn moeilijk te definiëren. Een aanzet om alle structural alerts voor reprotoxiciteit te verzamelen is uitgevoerd in Hulzebos et al. (2001, 1999). Deze aanzet zou verder uitgebouwd en onderbouwd kunnen worden. De voorspellingen met twee andere (Q)SAR modellen namelijk TOPKAT and Multicase voor dezelfde stoffen zullen volgend jaar gerapporteerd worden in samenwerking met de Deense EPA.

(13)

1.

Introduction

1.1

Reproductive toxicity and REACH

In reproductive toxicity studies a very large number of laboratory animals are used. For example: a standard developmental toxicity test requires at least 160 animals, not to mention the pups of the first generation. Reproductive studies are very difficult to perform, labour intensive, time consuming and expensive.

A proposal for a new chemicals policy in the European Union called REACH (Registration,

Evaluation and Authorisation of Chemicals) was recently published (EC, 2003c). The main

principle is that all industrial chemical sold in quantities over one tonne each year have to be registered. If a substance is considered to be of concern (i.e. classified as a category 1 or 2 carcinogen, mutagen, as toxic to reproduction category 1 or 2, as substances that are (very) persistent, (very) bioaccumulative and toxic or as endocrine disrupters), it must be registered and evaluated or authorised. In practice this means that approximately 5,500 substances which are produced in large quantities or which are ‘suspect’ should be evaluated before 2008, and another 24,500 must comply with regulations by 2012 at the latest (TNO, 2002). There is very little existing data available for most of these substances. For chemicals with the highest production volume only for 14% have a base set available according to Directive 67/584EEC (EC, 1967) 65% have less than the base set and 21% have no data at all (EC 2003b). An estimated 2893 developmental toxicity studies and 2135 two-generation

reproduction toxicity studies will have to be performed, which will cost in total € 852 million (EC, 2003b). Apart from the unacceptable large number of experimental animals needed and the financial burden, this will also exceed the capacity of the existing testing facilities. Therefore, REACH encourages the use, as far as practicable, of non-animal test methods and the development of alternative methods (EC, 2001c). One of these alternatives is the

application of (Quantitative) Structure-Activity Relationships. QSARs are simplified mathematical models of complex chemical-biological interactions. SARs are qualitative relationships in the form of structural alerts (fragments of chemical structure), that include molecular substructures or fragments related to the presence or absence of activity. These theoretical models can be used to predict the physicochemical, toxicological and

pharmacological properties of molecules. (Q)SARs are already widely used by the

pharmaceutical industry, but only to a limited extent for the investigation of other chemicals (mainly for assessing environmental exposure risks).

1.2

Objectives

In order to enhance the application of SARs and QSARs in risk assessment there is an urgent need for answering questions regarding the validity of the available QSARs. The aim of this report is to investigate the performance and validity of SAR-tools that can predict

reproductive toxicity, according to the OECD principles (OECD, 2004b). Reproductive toxicity was chosen as an endpoint, as this is an important criterion for REACH and because it is a qualitative and not quantitative endpoint. It is also an important criterion in view of the potential savings in resources and in experimental animals.

(14)

For this investigation two major SAR-models were chosen: DEREKfW and the TSCA Chemical Categories List. DEREKfW (Deductive Estimation of Risk from Existing

Knowledge for Windows) is a rule-based expert system, predicting the toxicological

properties of chemicals based on an analysis of their molecular structure. (LHASA 2002). The TSCA Chemical Category List was developed within the scope of the US Environmental Protection Agency's (EPA's) New Chemicals Program: chemicals with shared chemical and toxicological properties were grouped into Chemical Categories (TSCA, 2002).

The approach taken consists of two elements:

1. Investigation into the compliance of DEREKfW with the so-called OECD (Q)SAR principles (OECD, 2004a,b). This investigation was performed in a general way, for all endpoints, and subsequently more in detail for the reproductive toxicity endpoint;

2. Comparing DEREKfW en the TSCA Chemical Category List predictions for reproductive toxicity with the EU reproductive classified chemicals of Directive 67/548/EC (referred to as ‘Annex I’ of this Directive).

1.3

Structure of the report

In the Chapter 2 the computer model DEREKfW and the TSCA Chemical Category List are generally described. In the same chapter the performance set of ‘Annex I’ substances together with the methods of performing research are presented. In the third chapter the evaluation of DEREKfW according to OECD (Q)SAR principles is performed and performance of

DEREKfW and the TSCA Chemical Category List predictions concerning the reproductive toxicity are discussed. The last chapter presents our discussion and conclusions concerning validity of the DEREKfW model and the performance of both models for predicting the reproductive toxicity endpoint. This chapter also presents the conclusion about how DEREKfW fulfils the OECD (Q)SAR principles. General recommendations for the improvement of the validated models and proposal for further investigations are also included.

(15)

2.

Methods

Chemicals classified in the EU for reproductive toxicity (2.1and 2.4) were run through the software programme DEREKfW (2.5) and manually through the TSCA Chemical Category List (2.6). The goal of this exercise is to relate the prediction of both models to EU

classification (e.g. testicular effect predicted in the model in relation to reproductive toxicity in the EU classification (2.7 and 2.9). In addition, the OECD (Q)SAR principles were applied to DEREKfW (2.8).

2.1

Annex I, Annex VI and TGD

In this chapter several sources come forward, which require some explanation and comments. It concerns documents used in the EU for the risk evaluation of new and existing chemical substances. The background for all these documents is the Dangerous Substances Directive (EC, 1967).

The Council Directive on Dangerous Substances specifies the hazard classification, packaging and labelling requirements for dangerous substances supplied in the European Union. The technical content of the Directive is contained in a number of Annexes. Two of them are used in this report:

• Annex I of Directive 67/548/EEC contains a list of harmonised classifications and labelling for substances or groups of substances, which are legally binding within the EU. The list is regularly updated through Adaptations to Technical Progress (e.g. 28th ATP) (EC, 2001a). Revised and new classifications inserted to the list are proposed by DG ENV and agreed by a Member State vote.

The DG ENV proposal is based on advice from the Commission Working Group on Classification and Labelling with participation of experts from the Member States. Their meetings are prepared, chaired and followed-up by the ECB (EC, 2001). Annex I at present contains approximately 2550 existing and 700 new substance entries (EC, 2001a, 2004b).

• Annex VI Directive 67/548/EEC; how to classify a substance not yet present in Annex I (EC, 2004b). Usually it is referred to as the Classification and Labelling guide. If a dangerous substance not yet included in Annex I is put on the market (as a pure product or contained in a preparation), manufacturers/ importers/distributors have to self-classify the substance according to the criteria in Annex VI. Criteria for classification on the basis of the intrinsic (physical-chemical, toxicological and ecotoxicological) properties of a substance are described for each R(risk) phrase. Depending on the risk phrase, a safety (S) advice phrase may be required. Safety phrases give advice on how to handle a dangerous chemical. (EC, 2001b)

Besides the Dangerous Substances Directive (67/548/EEC), the EU Technical Guidance Document (EC, 2003a) is used. This document supports legislation on risk assessment for human health and environment of new, existing and biocidal chemical substances. It contains information on the required tests and test strategies, calculation of emissions, fate, and consumer/worker exposure. In addition some QSAR estimations are included.

(16)

2.2 Definitions

and

terminology for the assessment of the

model outcome

2.2.1 Definitions

Some terms used in the literature and used in the present report are explained in this present paragraph.

Descriptor - A word, phrase, or alphanumeric character used to identify an item in an

information storage and retrieval system. (American Heritage® Dictionary, 2004)

Domain of applicability for the SAR, the range of physicochemical properties or chemical

classes of chemicals or certain substituents for which the SAR is applicable (OECD, 2004a).

Assessment endpoint - An explicit expression of a toxic response to a substance that is used

as the basis of a health evaluation. In this report ‘reproductive toxicity’ is an assessment endpoint (OECD, 2003a).

MDL molfile – A molecular structure format file. The MOL file format is used to encode

chemical structures, substructures and conformations as text-based connection tables (Van de Waterbeemd et al., 1997).

SAR – Structure Activity Relationship is an explicit description of the substructure (structural

fragment or structural alert), including an explicit identification of its substituents, that underline the reactivity of the molecule (OECD, 2004b).

SMILES notation - (Simplified Molecular Input Line Entry Specification). A simple,

concise and rather readable molecular structure specification format (Weininger, 1988).

Specific endpoint – Some assessment endpoints may be expressed in several ways. For

example, the assessment endpoint reproductive toxicity may be expressed as impaired fertility or developmental toxicity.

Structural fragment or Structural alert – Part of a chemical structure, which may be

associated with a certain (toxicological) action. Together with the identified substituents, the structural fragment (structural alert) forms the SAR.

Toxophore – segments of molecule, which are associated with a specific activity (LHASA,

2002).

2.2.2 Terminology for the assessment of the model outcome

The criteria used for judging the trustworthiness of the SARs tested in this report are derived from the US ICCVAM (Interagency Coordinating Committee on the Validation of

Alternative Methods) (ICCVAM, 2002). ICCVAM, as is ECVAM in Europe, is responsible, among others, for the validation of in vitro methods for toxicological endpoints. For this reason the in vitro test results of evaluated tests are compared with in vivo test results. Also other frameworks, which are working in this field, have established similar definitions (ECETOC, 2002 and ECVAM, 2002). In the present study we have compared the SAR predictions with the results of reproductive toxicity tests. The following terminology was used (ICCVAM, 2002):

• Sensitivity is defined as the proportion of all positive chemicals that are correctly classified as positive in a test.

• Specificity is defined as the proportion of all negative chemicals that are correctly classified as negative in a test.

• Accuracy (concordance) is defined as the proportion of correct outcomes of a method. • False positive rate is defined as the proportion of all negative chemicals that are falsely

(17)

• False negative rate is defined as the proportion of all positive chemicals that are falsely identified as negative.

2.3 The test set: Annex I substances

The term ‘reproductive toxicity’ is used in the Technical Guidance Document of the EU to describe the adverse effects induced (by a substance) on any aspect of mammalian

reproduction. It covers all phases of the reproductive cycle, including impairment of male and female reproductive function, capacity and the induction of non-heritable adverse effects in the progeny e.g. death, growth retardation, structural and functional effects (EC, 2003). According to the Classification and Labelling Guide, substances and preparations which are toxic for reproduction are defined as substances and preparations which, after inhalation, ingestion or skin penetration, may produce or increase the incidence of non-heritable adverse effects in the progeny and/or an impairment of male or female reproductive functions or capacity.

Reprotoxic chemicals, as described above, were for the purpose of this performance obtained from two sources. One of them was Annex I of the 28th adaptation of Directive 67/548/EEC

(EC, 2001 a) as this already has official legal status. The second source was the working database of substances classified as agreed in the Commission Working Group of Classification and Labelling of July 2003, proposal for 29th ATP. This list of labelled

substances has no official legal status yet, discussion still has to take place (EC, 2004b). The list is expected to be made officially within one year). Both databases are available on the internet at http://ecb.jrc.it/classification-labelling/. The two databases were chosen for the following reasons:

• all substances presented were evaluated according to one set of criteria, by the same experts;

• the data are not confidential;

• information on all substances is available at the ECB website (CAS number, chemical name and complete classification).

For chemicals which were included in both databases, the classification from the working database was used, because it also included additional, recently proposed classification. An overview of all chosen substances (108), including CAS numbers and classification for reprotoxic properties, is presented in Appendix 1.

2.4 Classification and labelling for reproductive toxicity

As mentioned previously (definition given in paragraph 2.3), reproductive toxicity may be divided in two parts:

Effects on male or female fertility includes adverse effects on libido, sexual behaviour, any

aspects of spermatogenesis or oögenesis, hormonal activity and physiological response which could interfere with the capacity to fertilise, fertilisation itself or the developing ovum up to and including implantation (EC, 2001b). The effects are described by the risk phrases R60 (May impair fertility) and R62 (Possible risk of impaired fertility). The difference between these phrases is significant: R60 is assigned to substances which are known to impair fertility in human or should be regarded as if they impair human fertility; R62 is assigned to

substances which cause concern for human fertility. Classification with R60 is more severe and based on more convincing evidence than classification with R62. For the detailed explanation see Table 1.

(18)

Developmental toxicity is defined in its broadest sense as any effect interfering with normal

development, both before and after birth and includes embryotoxic/fetotoxic effects e.g. reduced body weight, growth and developmental retardation, organ toxicity, death, abortion, structural defects (teratogenic defects), functional defects, peri-postnatal defects, and

impaired postnatal mental or physical development up to and including pubertal development (EC, 2001b). The effects are expressed/described by the risk phrase R61 (May cause harm to the unborn child) and R63 (Possible risk of harm to the unborn child.). Difference between these phrases is significant: R61 is assigned to substances which are known to cause Table 1: R-phrases and explanation (EC, 2001 b).

R-sentence Explanation

R60 (May impair fertility),

(category 1 and 2) Substances known to impair fertility in humans. There is sufficient evidence to establish a casual relationship between human exposure to the substance and impaired fertility. Or

Substances which should be regarded as if they impair fertility in humans. There is sufficient evidence in animal studies of impaired fertility in the absence of toxic effects, or, evidence of impaired fertility occurring at around the same dose levels as other toxic effects but which is not a secondary non-specific consequence of the other toxic effects.

R62 (Possible risk of impaired fertility)

(category 3) Substances which cause concern for human fertility. Generally this conclusion is based on results in appropriate animal studies which provide sufficient evidence to cause a strong suspicion of impaired fertility in the absence of toxic effects, or, evidence of impaired fertility occurring at around the same dose levels as other toxic effects but which is not a secondary non-specific consequence of the other toxic effects, but where the evidence is insufficient to label substance with R60.

R61 (May cause harm to the unborn child),

(category 1 and 2)

Substances known to cause developmental toxicity in humans. There is sufficient evidence to establish a casual relationship between human exposure to the substance and the subsequent developmental toxic effect in the progeny. Or

Substances which should be regarded as if cause developmental toxicity to humans. There is sufficient evidence to prove a strong presumption that human exposure to the substance may result in developmental toxicity, generally on the basis of clear results in appropriate animal studies where effects have been observed in the absence of signs of marked maternal toxicity, or at around the same dose levels as other toxic effects but which is not a secondary non-specific consequence of the other toxic effects.

R63 (Possible risk of harm to the unborn child.)

(category3)

Substances which cause concern for humans owing to possible developmental toxic effects. Generally this conclusion is based on results in appropriate animal studies which provide sufficient evidence to cause a strong suspicion of developmental toxicity in the absence of signs of marked maternal toxicity, or at around the same dose levels as other toxic effects but which is not a secondary non-specific consequence of the other toxic effects, but where the evidence is insufficient to label substance with R61.

R64 (May cause harm to breastfed babies)

Substances which are not classified as toxic to reproduction but which cause concern due to toxicity when transferred to the baby during the period of lactation. This R-phrase may also be

appropriate for substances which affect the quantity or quality of the milk.

developmental toxicity in human or should be regarded as if they cause developmental toxicity in humans; R63 is assigned to substances which cause concern for humans owing to possible developmental toxic effects. Classification with R61 is more severe and is based on

(19)

a greater amount of evidence than the classification with R63. For a detailed explanation see Table 1.In addition, another endpoint for the reproductive toxicity was taken into account in this investigation. Effects during lactation concern the toxic effect on offspring resulting only from exposure via breast milk, or the effect on quality and quantity of milk. These effects are described in the risk phrase R64 (May cause harm to breastfed babies). For a detailed explanation see Table 1.

Summarising, reprotoxic chemicals were extracted from Annex I of the 28th adaptation of Directive 67/548/EEC (EC, 2001a) and the working database of substances classified as agreed in the Commission Working Group of Classification and Labelling, proposal for 29th ATP using the R-sentences: R60, R61, R62, R63 and R64.

2.5 DEREKfW – knowledge based system

2.5.1 General information on DEREKfW

DEREKfW (Deductive Estimation of Risk from Existing Knowledge) is a rule-based expert system for predicting the toxicological properties of chemicals based on an analysis of their molecular structure (LHASA, 2002). The predictions are based on the following criteria: • Structural alerts. The term structural alert refers to one or a combination of structural

features in a molecule and gives a signal that a particular toxic effect may occur; • Species;

• Toxicity data;

• Toxicological endpoint;

• Physico-chemical properties (e.g. log Kp for skin permeability and molecular weight). The predictions are formed based on the above mentioned criteria and are used to form the so-called reasoning rule. Reasoning rules have the following formula (DEREKfW user guide, 2002):

If [Grounds] is [Threshold] then [Proposition] is [Force].

- Grounds is the evidence to be considered by the reasoning rule (for example SAR for certain toxicological endpoint);

- Threshold is the level above which the grounds must be for the proposition to be assigned the force (for example cut-off values or limitations considering the structure);

- Proposition is the outcome of the reasoning rule (for example a chemical is considered to be developmental toxicant);

- Force is the likelihood of the reasoning rule outcome (for example ‘plausible’, see section 1.5.3 for further explanation of likelihood levels).

As shown in figure 1, a reasoning rule leads to a conclusion which, sometimes in combination with other rules, leads to a toxicity prediction and the likelihood thereof. All outcomes are peer reviewed by expert toxicologists and are supported by literature references (Greene et al., 1999).

(20)

Figure 1: Example of reasoning rules for developmental toxicity.

2.5.2 Predicting part and Editor part

DEREKfW is divided into two main parts:

Part 1: Prediction of the toxicological properties of a chemical.

• Most endpoints are directly associated with a structural fragment, if a toxophore has been detected in the examined molecule. In other words a toxophore (SAR) forms Grounds for the reasoning rule. Figure 2 presents an example of the SAR based prediction for

developmental toxicity, with a likelihood of ‘plausible’ (see 2.5.3). Structural fragments including domains, comments and references are always given for this type of prediction. As one can see on the Figure 2, examples in this case are not available.

Figure 2: Example a DEREKfW prediction for developmental toxicity, based on structural fragment.

• Some endpoints are predicted by the reasoning engine and are associated with a reasoning rule, but not directly associated with the presence of a structural alert within the examined molecule. Figure 3 presents an example of the reasoning rule based prediction. In this case a partitioning rate for dermal absorption (log Kp) forms Grounds for the reasoning rule. No references are given. The reasoning for this prediction was that the log Kp, estimated by DEREKfW for the tested substance was within the range of required domain.

(21)

Figure 3: Example of DEREKfW prediction for skin sensitisation, based on the reasoning rule.

Part 2: Editor part. Here all endpoints and connected structural alerts are listed. If a known

positive substance is incorrectly predicted, the researcher can check in the editor part, if a certain structural alert is present for the investigated endpoint. In addition, there are tools for users to add their own in-house alerts or reasoning rules.

The program covers the following toxicological endpoints: carcinogenicity, mutagenicity, skin sensitisation, teratogenicity, irritation, and respiratory sensitisation. Definitions of endpoints are not available in the model itself or in the manual. For the complete overview of the endpoints see Appendix 2.

It is possible to choose the species for which predictions are required e.g. humans, mammals (rat, mouse, guinea pig, hamster and primate) and bacteria. The chemical structures can be imported into DEREKfW via its automatic link to ISIS/Draw (computer model for drawing chemical structures) or by importing MDL Molfiles.

2.5.3 Likelihood levels in DEREKfW

When a structural alert or a match for a reasoning rule is found by DEREKfW in a molecule, the expected endpoint is indicated. The reliability of these predictions is presented in the form of one of eight levels of likelihood (e.g. see Figure 2 and 3). They are listed in the table below (LHASA, 2002).

Predictions associated with a structural alert are more transparent than predictions based on other grounds (Kp, molecular weight). In the first case references, examples and domain are available. In the second case only a cut-off value or domain is available, without references or further explanation. Note: Version 8.0 does contain references for some of these rules (Personal communication with LHASA).

(22)

Table 2: Levels of likelihood from DEREKfW and their definition.

Levels of likelihood

Definition

Certain There is proof that the proposition is true

Probable There is at least one strong argument that the proposition is true and there are no arguments against it

Plausible The weight of evidence supports the proposition

Equivocal There is an equal weight of evidence for and against the proposition Doubted The weight of evidence opposes the proposition

Improbable There is at least one strong argument that the proposition is false and there are no arguments that it is true

Open There is no evidence that supports or opposes the proposition Contradicted There is proof both that the proposition is true and that it is false

2.5.4 DEREKfW alerts for reproductive toxicity

There are 9 structural alerts included in DEREKfW for the reproductive toxicity endpoints. DEREKfW contains 3 alerts for developmental toxicity, 5 alerts for teratogenicity, 1 alert for testicular toxicity:

1. Polyalkyl urea: developmental toxicity, teratogenicity in rat foetus;

2. Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors: developmental toxicity, teratogenicity/foetotoxicity;

3. Benzidine-based bisazo compound: developmental testicular toxicity (testis weight and enumeration of atrophic tubules);

4. Thalidomide-type compound: teratogenicity;

5. Short chain carboxylic acid or precursor: teratogenicity;

6. Pyrroline ester, pyrroline N-oxide ester, pyrrole ester or pyrrole alcohol: teratogenicity;

7. Triazole antifungal analogue: teratogenicity; 8. Retinoid or analogue: teratogenicity;

9. Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors: testicular toxicity, testicular atrophy.

2.6 TSCA Chemical Categories List

The US Environmental Protection Agency (EPA) New Chemicals Program was established to help manage the potential risk from new chemicals. In 1987, notified chemicals with shared chemical and toxicological properties were grouped into categories, the so called Chemical Categories (TSCA, 2002). Currently, there are a total of 45 categories, listed in the TSCA report. A category statement contains:

• description of the molecular structure;

• boundary conditions such as molecular weight, equivalent weight, the log of the octanol/water partition coefficient (log P) and water solubility;

• standardised hazard and fate tests to address concerns for the category. TSCA chemical category for reproductive toxicity

The following Categories are connected to reproductive toxicity:

1. Acrylamides: reproductive and developmental toxicants. There is no concern for chemicals with a MW of > 5000 and there is concern if MW is < 1000. The chemicals with a MW between 1000 and 5000 are assessed on a case by case basis;

(23)

2. Anhydrides, Carboxylic Acid: concern for potential developmental or reproductive toxicity based on data for maleic, succinic, and phthalic anhydrides;

3. Dianilines: potential retinotoxic agents by analogy to 4,4'-methylenedianiline, 4,4'-oxydianiline, and the diaminodiphenyl alkane drugs are also potential reproductive and systemic toxicants by analogy to 4,4'-methylenedianiline;

4. Benzotriazole-hindered phenols: Reproductive toxicity, including atrophy of the seminal vesicles, significant reduction in absolute and relative testes weight, significant reduction in absolute and relative prostate weight, and abnormal spermatogenesis. Boundaries cannot be given;

5. Boron compounds: reproductive toxicity (i.e., sterility in males and females, and testicular atrophy in males). Nothing is mentioned on boundaries;

6. Epoxides: reproductive effects. Epoxides with a MW > 1000 are not expected to cause concern;

7. Ethylene Glycol Ethers: Short-chain ethylene glycol ethers are developmental and reproductive toxicants. No concern is expected if the alkylchain is > C7;

8. Hindered Amines: toxic to male reproductive system;

9. Nickel Compounds: fetotoxicity. Ni2+ needs to be released for the effect;

10. Triarylmethane Pigments/Dyes with non-solubilizing Groups: developmental and

reproductive toxicity. Dyes with soluble groups are not expected to be of concern, neither are insoluble pigments/dyes (1 ppb);

11. Vinyl Esters: Reproductive toxicity.

In the list boundaries are given whether the alerts can be used for prediction, e.g. molecular weight.

2.7 DEREKfW – Performance procedure

2.7.1 Technical details of testing DEREKfW

A special step-by-step procedure was introduced to avoid errors in the sketching of the structures of the chemicals. The SMILES notation of each chemical was produced from the CAS-number by EPIwin V3.10. The SMILES notation was copied into ACD/Chem Sketch Freeware v5.11. Chem Sketch (a computer drawing program for chemical structures) generated a structure from the smiles notation. A MDL molfile was exported from Chem Sketch and imported into DEREKfW 6.0 (of 2002/3, the last updated version released in 2004 is version 8.1). A prediction was considered positive if an alert was found by DEREKfW for testicular or developmental toxicity or teratogenicity. All inorganic

substances were left out as no alerts for reproduction toxicity for this group of substances are present in DEREKfW. Some other substances were not used because only the name of the class was given or because the structural formula could not be reproduced. Overall, 108 substances were tested (See Appendix 1).

2.7.2 Comparison DEREKfW versus Annex VI definitions

The following reprotoxic endpoints are predicted in DEREKfW:

1. Developmental toxicity 2. Teratogenicity

3. Testicular toxicity

These endpoints are not further specified or defined in DEREKfW. It appeared that the terms used in the Annex VI classification guideline (EC, 2001b) and in DEREKfW are different.

(24)

Therefore, in this paragraph terms used in DEREKfW and the Annex VI definitions are compared and brought in line.

The DEREKfW endpoint testicular toxicity falls within the criteria for fertility (R60 and R62). However, the Annex VI criteria for fertility also include effects on females and other effects that could influence the fertility such as libido (EC, 2001b). For the purpose of this performance, all substances which were recognised by DEREKfW as testicular toxicants were regarded as impairing the fertility and comparable with R60/62.

The DEREKfW endpoints teratogenicity and developmental toxicity fall within the Annex VI criteria for developmental toxicity (R61 and R63). For the purpose of this performance, all substances which were recognised by DEREKfW as teratogenic or developmental toxicants were regarded to be comparable with this labelling.

No endpoint comparable to effects on lactation (R64) is available in DEREKfW. Therefore, no comparison with DEREKfW could be made.

Table 3 Comparison of classification from Annex 1 (EC 2001 a) and DEREKfW.

Classification Annex 1 Terms used in DEREKfW R60 (May impair fertility)

R62 (Possible risk of impaired fertility) Testicular toxicity R61 (May cause harm to the unborn child)

R63 (Possible risk of harm to the unborn child.) Teratogenicity Developmental toxicity

2.7.3 Interpretation of DEREKfW predictions

For all chemicals recognised by DEREKfW as having a structural alert for reproductive toxicity, prediction was given a level of likelihood ‘plausible’. Information on the background of the predictions is given in rules. The reliability was judged by careful examination of the related rule, references and example compounds. We considered all ‘plausible’ predictions as positive, and therefore accepted them as a positive prediction.

2.8 OECD (Q)SAR principles for (Q)SAR computer models

In March 2002 in Setubal (Portugal) a workshop was held on the use of (quantitative) structure activity relationships for regulatory purposes, as one of the components of the chemicals safety assessment (ECETOC, 2002). Any model used for regulatory purposes should be scientifically valid, appropriate for the purpose intended, reliable and accepted by decision-makers. To allow screening for the usefulness of existing models, the principles are developed by the Ad Hoc Expert Group on (Q)SARs and can found below:

1) a model should be associated with a defined endpoint which it serves to predict; 2) take the form of an unambiguous and easily applicable algorithm for predicting a (pharmaco)- toxicological endpoint;

3) a mechanistic interpretation, if possible

4) be accompanied by a definition of the domain of its applicability

5) be associated with a measure of its goodness-of-fit (internal validation);

6) be assessed in terms of its predictive power by using data that were not used in the development of the model (external validation).

A more recent document is has become available late 2004 in which the order of the principles is changed (a mechanistic interpretation is the last principle) and principle five and

(25)

six are combined (OECD, 2004b). These OECD (Q)SAR principles are applied to DEREKfW.

2.9 The TSCA Chemical Category List, performance

procedure

Additionally to testing DEREKfW we tried to predict reprotoxic substance from Annex I using the TSCA Chemical Category List (TSCA, 2002). The structural formula of every Annex I chemical was compared with the structural alert and explanation given in the

Chemical Category document. Cut-off values for the chemical structure and molecular weight were also taken into account. If a compound could be classified as a member of a TSCA Chemical Category, and the Annex I classification and labelling was comparable to the effects predicted for a category, we considered the prediction as positive. The terms used in the TSCA Chemical Category List were related to the Annex I classification and labelling risk sentences as follows:

Table 4: Comparison of classification from Annex 1 (EC 2001 a) and the TSCA Chemical Category List.

Classification Annex 1 Terms used in TSCA Chemical Category List R60 (May impair fertility)

R62 (Possible risk of impaired fertility) Toxic to male reproductive system, Sterility in males and females, Testicular atrophy in males, Reproductive toxicity R61 (May cause harm to the unborn child)

R63 (Possible risk of harm to the unborn child.) Fetotoxicity, Reproductive toxicity, Developmental toxicity,

The term ‘reproductive toxicity’ was understood as both impairing fertility and harmful to the unborn child, if no further specification was given in the TSCA Chemical Category List. In addition, the authors of the report assumed that when TSCA Chemical Category List stated that chemicals are possibly reproductive toxic that the NOAEL for reproductive toxicity is below 1000 mg/kg bw, as this is the maximum dose that need to be dosed according to OECD reproductive toxicity guidelines.

(26)
(27)

3. Results

3.1 DEREKfW versus OECD (Q)SAR principles

In this chapter the explanation of and comments to the OECD (Q)SAR principles are presented as they were in their draft form, together with the results of the DEREKfW evaluation using these criteria. A preliminary evaluation of DEREKfW was performed for all endpoints. Subsequently the endpoint reproductive toxicity was evaluated in a more detailed manner.

Setubal principle 1: A model should be associated with a defined endpoint which it serves to predict. A well defined (eco)toxicological endpoint (for example developmental toxicity, sensitization, irritation and corrosivity) should be present, which has clear relevance for defined purpose. In this case the purpose is the preliminary screening of the substances in order to select reprotoxic substances. The background information (i.e. experimental conditions and conditions of the performed tests) should be available (OECD, 2004b) Comments concern Setubal principle 1:

General comments: This model contains 36 endpoints (see Appendix 2 for the overview

table). Definitions of these endpoints are not available in DEREKfW and need to be retrieved from the references given together with the predictions. These toxicological endpoints are based on different numbers of structural alerts, varying from one to 77. Only a few endpoints are based on a significant number of structural alerts:

• Mutagenicity: 77 structural alerts; • Skin sensitisation: 61 structural alerts; • Carcinogenicity: 46 structural alerts; • Skin irritation: 25 structural alerts; • Eye irritation: 29 structural alerts;

• Respiratory track irritation: 19 structural alerts; • Thyroid toxicity: 14 structural alerts;

• Respiratory sensitisation: 13 structural alerts.

For the other toxicological endpoints less than 10 structural alerts are given. For 13 endpoints only one structural alert is present.

Specific comments for reproductive toxicity. For the general endpoint reproductive toxicity

there are three more specific endpoints available: developmental toxicity (3 structural alerts), teratogenicity (5 structural alerts) and testicular toxicity (1 structural alert). For examples and number of available references see table 5.

(28)

Table 5: Reproductive toxicity in DEREKfW. Endpoint Structural alert (alert

description) Number of references Examples of active substances

Polyalkyl urea 3 N, N’-dimethylurea,

N,N,N’-trimethylurea and tetramethylurea Monothioglycol or glycol

monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors 1 No examples Developmental toxicity Benzidine-based bisazo

compound 1 Chlorazol Congo red black E Diamine blue

Thalidomide-type compound 3 Thalidomide Short chain carboxylic acid or

precursor 6 No examples.

Pyrroline ester, pyrroline N-oxide ester, pyrrole ester or pyrrole alcohol

5 No examples

Triazole antifungal analogue 1 No examples Teratogenicity

Retinoid or analogue 4 No examples Testicular

toxicity

Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors

1 No examples

Conclusion DEREKfW versus Setubal 1 principle: There are several toxicological endpoints available in DEREKfW. However, definitions of these endpoints are given only in the references given with predictions, they are not available in the manual or programme. The number of the structure alerts varies between endpoints from one to 77. Titles of references containing background information are always available, but examples of active substances are missing in many cases. For the general endpoint reproductive toxicity only a limited amount of structural alerts is available.

Additional conclusion: Authors of this paper noted, that it was rather difficult to make a clear distinction between the Setubal criteria 2 and 4.

Setubal principle 2: A model should take the form of an unambiguous and easily applicable algorithm for predicting a pharmaco-toxicological endpoint. (…) Individual structural alerts should not be considered for validation in isolation, but an integrated approach should be taken assessing all the rules at the same time in the context of their hierarchy. (OECD). Setubal principle 4: A model should be accompanied by a definition of the domain of its applicability. (…) In the case of a SAR, information should be given if the substructure associated with any inclusion and/or exclusion rules on its applicability to groups of chemicals.

In both cases it might be understood, that also other substituents, next to the toxophore itself, should be taken into account. A distinction may be made between substituents accompanying the toxophore directly and other active substituents present in the molecule that may influence the toxicity. Setubal principle 2 and 4 show some overlap considering SAR principles Setubal 2 requires the description of the structural alerts including the substructural environment. This is closely related to Setubal principle 4, for in which also need to be described which atoms related to the alert belong to the domain of the alert and which do not.

(29)

Setubal principle 2: A model should take the form of an unambiguous and easily applicable algorithm for predicting a pharmacotoxicological endpoint. An explicit description of the substructure, including an explicit identification of its substituents should be present. (OECD, 2004).

Comments concerning Setubal principle 2:

General comments: In DEREKfW the relevant part of the molecule, being a structural alert

valid for a certain specific endpoint, is marked red. If a structural alert is found by DEREKfW in a molecule and the domain requirements are met, the expected endpoint is indicated. Indications whether the substructure is associated with any inclusion and/or exclusion rules on its applicability to groups of chemicals are usually available. Cut-off values (for example, chain length and certain substituents) for the structural alerts are taken into account in most cases.

Specific comments for reproductive toxicity: For the reproductive toxicity endpoint

additional requirements for the substructures accompanying the toxophore are given for the following structural alerts:

• Polyalkyl urea

• Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors

• Benzidine-based bisazo compound • Thalidomide-type compound

• Short chain carboxylic acid or precursor

• Pyrroline ester, pyrroline N-oxide ester, pyrrole ester or pyrrole alcohol • Retinoid or analogue

• Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors

For the reproductive toxicity endpoint no domain was given for Triazole antifungal analogue, only a general structural formula, without additional requirements or explanations available. The most detailed domain description is available for ‘Short chain carboxylic acid or precursor’ and for ‘Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors’. For other structural alerts only short indications about the possible substituents, required for the alert to fire together with a general structural fragment were given.

Conclusion DEREKfW versus Setubal 2 principle: DEREKfW does contain information about the substituents required for a particular toxophore to be positively identified, but not for all structural alerts. It was concluded, that in general individual structural alerts are not taken into account in isolation, but there are exceptions. For example for the reproductive toxicity alert ‘Triazole antifungal analogue’nothing except the general structural formula is given, therefore it is concluded, that this toxophore was taken into account in isolation.

DEREKfW predicts correctly according to its own definitions.

Setubal principle 3: A model should, ideally, have a clear mechanistic basis. In the case of a SAR, there should be a description of the molecular events that underlie the reactivity of the molecule. References should be present (OECD, 2004b). Examples of substances, on which a SAR was based should always be available. Authors of the present report understand ‘molecular event’ to be the interactions between the molecule and the target receptor/organ.

(30)

Comments concern Setubal principle 3:

General comments: A proposal of the mechanistic background is available in DEREKfW in

a few cases. Examples of substances that are proven to have a certain toxicological action are presented in some cases.

Specific comments for reproductive toxicity. For the reproductive toxicity endpoint the

mechanistic basis is given for two from nine available structural alerts • short chain carboxylic acid or precursor and

• pyrroline ester, pyrroline N-oxide ester, pyrrole ester or pyrrole alcohol).

Examples are given for three structural alerts: • Polyalkyl urea,

• Benzidine-based bisazo compound and • Thalidomide-type compound.

General structural formulas are available for all structural alerts. For all available structural alerts references are available.

Conclusion DEREKfW versus Setubal 3 principle: The description of the mechanistic basis is for several structural alerts not sufficient/adequate enough in DEREKfW and examples of active substances are sometimes missing. On the other hand, references are always present. Setubal principle 4: A model should be accompanied by a definition of the domain of its applicability. For example the range of physico-chemical properties or chemical classes of chemicals for which it is applicable. In the case of an SAR, information should be given on its applicability to groups of chemicals taking into account if the substructure associated with any inclusion and/or exclusion rules associated with the substructure. In addition the

modulatory effects of the substructure’s molecular environment should be taken into account. (OECD, 2004b) The authors of the present report understand with the term ‘substructure’s molecular environment’ other activating or deactivating parts in the molecule.

Comments concern Setubal principle 4:

General comments: DEREKfW can assess organic chemicals and some metals. DEREKfW

always point out the structural alert, if recognised and if within the chemical domain. Molecular weight and lipophilicity are taken into account for some predications. The

presence of other active substituents in the molecule, not directly associated with the structure alert, is sometimes taken into account. For some structural alerts requirements for the whole molecule are given, for some only substituents directly associated with toxophore are taken into account.

The reliability of the predictions is presented in the form of one of the eight levels of likelihood. These levels of likelihood depend on the species for which the prediction was made and in some cases on the molecular weight and/or predicted lipophilicity of the tested molecule. In other words: the level of likelihood is a combination of the presence of the structural fragment, the species in which the defined effect was proved, and the species for which the prediction is required and some physico-chemicals properties. Examples are given in the figures below (Figure 4, 5 and 6). Information on the background of the predictions is given in the form of rules, and the reliability should be judged by careful examination of the related rule.

Specific comments for reproductive toxicity: For reproductive toxicity all positive results

were given at ‘plausible’ likelihood level. Figures 4, 5 and 6 present rules given for each specific endpoint:

(31)

Figure 4: Rules for the developmental toxicity specific endpoint.

Figure 5: Rules for the teratogenicity specific endpoint.

Figure 6: Rules for the testicular toxicity specific endpoint.

For the reproductive toxicity endpoint, as for the other endpoints, activating or deactivating substituents not directly associated with the structural alert were not always taken into account. No substructural environment was given for ‘Triazole antifungal analogue’. The most detailed domain description is available for ‘Short chain carboxylic acid or precursor’ and ‘Monothioglycol or glycol monoalkyl ether, alkoxy- or alkylthio-carboxylic acid or precursors’.

Conclusion DEREKfW versus Setubal 4 principle: DEREKfW partly fulfils the Setubal principle 4, as substituents not directly associated with the structural alert were not always taken into account.

Setubal principle 5: A model should be associated with a measure of its goodness-of-fit (internal validation). There should be access to the training and validation data set as well as to the methods used for the development and validation of the model. This training set should include details of chemical names, structural formulas, CAS number (if available) and data for all background information, needed for the reliable interpretation of (Q)SAR (OECD, 2004).

Comments concern Setubal principle 5:

General comments: A training set is not available for DEREKfW. Several positive and

negative chemicals are used to establish the SAR, but only some positive examples are available for the user, as some data are confidential. Rules description and references are present in the program. The references contain part of the training set. For example the ECETOC report on glycol ethers is referenced for the alert ‘Monothioglycol or glycol monoalkyl ether’, giving the training set (ECETOC, 1995). The examples of positive substances are given together with predictions as ‘Alert description’. These data are also given in the editor part.

(32)

Specific comments for reproductive toxicity: For the reproductive toxicity endpoint,

positive examples of chemical on which the SAR is based are given for three structural alerts: • Polyalkyl urea,

• Benzidine-based bisazo compound and • Thalidomide-type compound.

No examples are given for the other structural alerts.

Conclusion DEREKfW versus Setubal 5 principle: A the training set is not available for all structural alerts, it was concluded that Setubal principle 4 is only partly fulfilled.

Setubal principle 6: A model should be assessed in terms of its predictive power by using data that were not used in the development of the model (external validation). For validation of models detecting several (eco)toxicological endpoints, the validation should be performed per endpoint. The following parameters should be given:

• the number of test structures; • the identities of the test structures;

• the approach for selecting the test structures;

• the statistical analysis of the predictive performance of the model (e.g. including sensitivity, specificity, and positive and negative predictions for classification models). According to the authors of the present paper, the external validation of a SAR model should not be performed for a model in general. More specific goals should be defined, for example the validation of a SAR model should be conducted for a well defined endpoint containing well defined structural alerts, e.g. the following validation set may be proposed:

- positive substances, containing tested SAR; - negative substances, containing tested SAR; - negative substances, not containing tested SAR. Comments concern Setubal principle 6:

General comments: As there is quite some debate on the principles of external validation,

we prefer the word performance instead of validation. The performance of DEREKfW for reproductive toxicity on industrial chemicals was one of the goals of this project. This

performance was carried out for the reproductive toxicity end point using 108 substances. All identities of tested structures are presented in Appendix 1. The approach for selecting

substances is given in chapter 2, ‘Methods’.

3.2 Performance of DEREKfW and the TSCA Chemical

Category List

3.2.1 The test set

From all the Annex 1 substances (EC, 2004b), 108 substances labelled with R60, R61, R62, R63 and R64 were initially chosen for this test (listed in Appendix 1, excluding all non-organic chemicals). No chemicals were chosen that are not classified for reproductive effect. As DEREKfW and the TSCA Chemical Category List do not contain the structural alert for effects during lactation, substances labelled with R64 only (three substances) were excluded from this performance. For the DEREKfW performance substances classified for

reproductive toxicity were considered to be correctly recognised, if they were not only recognised as reprotoxic chemicals, but also placed in the correct category (impaired fertility

Afbeelding

Figure 2:  Example a DEREKfW prediction for developmental toxicity, based on structural  fragment
Figure 3:  Example of DEREKfW prediction for skin sensitisation, based on the reasoning  rule
Table 4: Comparison of classification from Annex 1 (EC 2001 a) and the TSCA Chemical Category List
Table 5: Reproductive toxicity in DEREKfW.
+7

Referenties

GERELATEERDE DOCUMENTEN

8 Furthermore, although Wise undoubtedly makes a good case, on the basis of science, for human beings to show special concern for chimpanzees and many other animals of

[r]

Only a trend could be scored in Piperazine and PIP-C in zebrafish as unlike the effects in nematodes, developmental delay could be caused by a range of secondary effects like:

Ionica Smeets noemde het in 2010 tijdens de vakantiecursus een wel heel gênant probleem, omdat het op feestjes makkelijk aan leken uit te leggen is, maar dat al die slimme

Par ailleurs, le regard rétrospectif du narrateur de L’Africain privilégiant l’optique de l’enfant de huit ans fait ressortir un aspect important d’une rencontre

No Observed Effect Concentration (NOEC) and Lowest Observed Effect Concentration (LOEC) in particles/mL at which primary (PMP) and secondary microplastics (SMP) affected

Ondanks dat de studenten met ASS verschillende belemmeringen ervaren in het hoger onderwijs, is er nog maar weinig bekend op welke manieren studenten met ASS ondersteund kunnen