• No results found

Statistical methods for the analysis of bioassay data

N/A
N/A
Protected

Academic year: 2021

Share "Statistical methods for the analysis of bioassay data"

Copied!
175
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Statistical methods for the analysis of bioassay data

Citation for published version (APA):

Mzolo, T. V. (2016). Statistical methods for the analysis of bioassay data. Technische Universiteit Eindhoven.

Document status and date: Published: 19/04/2016 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

ANALYSIS OF BIOASSAY DATA

(3)

c

Thembile Virginia Mzolo, 2016

Statistical Methods for the Analysis of Bioassay Data by T. Mzolo - Eindhoven University of

Technology, 2016

A catalogue record is available from the Eindhoven University of Technology Library ISBN: 978-90-386-4051-8

(4)

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de rector magnificus prof.dr.ir. F.P.T. Baaijens, voor een

commissie aangewezen door het College voor Promoties, in het openbaar te verdedigen

op dinsdag 19 april 2016 om 16:00 uur

door

Thembile Virginia Mzolo

(5)

voorzitter: prof. dr. J. de Vlieg

1e promotor: prof. dr. E. R. van den Heuvel copromotor: dr. A. Di Bucchianico

leden: prof. dr. E. C. Wit (Rijksuniversiteit Groningen) dr. H. Geys (Universiteit Hasselt)

prof. dr. W. H. Woodall (Virginia Tech) prof. dr. R. Gorb (Universität Würzburg)

Het onderzoek of ontwerp dat in dit proefschrift wordt beschreven is uitgevoerd in overeen-stemming met de TU/e Gedragscode Wetenschapsbeoefening.

(6)
(7)

This research work was carried out over a four year period at the University Medical Center Groningen and Eindhoven University of Technology. A lot of people played a huge role in my success in this position and I cannot name them all here; but I will go over a few names.

Firstly I am grateful to my promoter Prof. dr. Edwin van den Heuvel who offered me this position a little over four years ago. This has been an amazing journey and I have been privileged to work with you. I learnt a lot from you and I am positive that we will continue working together in future.

I am thankful to my co-promoter dr. Alessandro Di Bucchianico, who dedicated a lot of time on my thesis. Thank you very much for your insightful comments on my research work.

This work would not have been possible without the financial support from MSD, Oss, The Netherlands. I am grateful to Erik Talens and Pieta IJzerman-Boon for being part of my journey. I also like to pass my gratitude to Marga Hendriks and Goele Goris who allowed me to work on their master theses. To everyone at MSD, including those who were responsible for the data that I used for my thesis, thank you.

I am grateful to Prof Henry Mwambi and Prof Khangelani Zuma who planted the biostatistics seed in me a few years ago when I was still in South Africa. Thank you for seeing something different in me and for your encouraging words.

To my colleagues in the Stochastics section at Eindhoven University of Technology and my former colleagues at the Medical Statistics unit at University Medical Center Groningen, thank you very much for your support.

To all members of the African Students Community in Belgium, Eindhoven and Groningen you made me feel at home away from home. The list of names is too long to name everyone here, all I can say is thank you for your warm friendship.

Phindile, Linda, Sisa, and Kwande, thank you for your kindness and the spirit of ubuntu you have shown towards me. For you driving from Bekkevoort all the way to Groningen or Eindhoven was never an issue. The good old South African ubuntu lives within you and I wish you more blessings in life.

(8)
(9)

Acknowledgements

Chapter 1: General introduction 1

1.1 A brief history of animal experiments . . . 1

1.2 Areas of applications . . . 3

1.3 Definition of bioassays . . . 3

1.4 Calculation of bioactivity . . . 5

1.4.1 Linear regression analysis for quantitative bioassays . . . 6

1.4.2 Probit analysis for quantal bioassays . . . 8

1.4.3 Goodness-of-fit and parallelism . . . 9

1.5 Motivation . . . 10

1.6 Aim of the thesis . . . 11

1.6.1 Estimation of bioactivity . . . 11

1.6.2 Estimation of product quality . . . 12

1.7 Scope of the thesis . . . 14

References . . . 15

Chapter 2: Equivalence testing for similarity in bioassays: A critical note 19 2.1 Introduction . . . 21

2.2 Statistical methodology . . . 22

2.2.1 Dose-response relationships . . . 23

2.2.2 Relative bioactivities . . . 24

2.3 Hypothesis testing . . . 26

2.3.1 Traditional hypothesis testing . . . 26

2.3.2 Equivalence testing . . . 27

2.4 Case studies . . . 29

2.4.1 Parallel line bioassays . . . 29

2.4.2 Slope-ratio bioassays . . . 31

(10)

Appendix . . . 39

References . . . 41

Chapter 3: A comparison of statistical methods for combining relative bioac-tivities from parallel line bioassays 43 3.1 Introduction . . . 45

3.1.1 Motivating example . . . 46

3.1.2 Objectives . . . 48

3.2 Parallel line bioassays analysis . . . 48

3.3 (Un)weighted average methods . . . 49

3.3.1 Bliss . . . 50

3.3.2 Cochran . . . 51

3.3.3 Morse and Bickle . . . 52

3.4 Maximum likelihood methods . . . 53

3.4.1 Armitage, Bennett, and Finney . . . 53

3.4.2 Williams . . . 54

3.4.3 Meisner, Kushner, and Laska . . . 54

3.4.4 Hardy and Thompson . . . 55

3.4.5 Random coefficient model . . . 55

3.5 Pretests for homogeneity . . . 56

3.6 A simulation study . . . 56

3.6.1 Design of the simulation study . . . 57

3.6.2 Simulation results . . . 57

3.7 Discussion and conclusion . . . 62

References . . . 64

Chapter 4: Statistical process control methods for monitoring in-house ref-erence standards 67 4.1 Introduction . . . 69

4.2 Statistical methods . . . 71

4.2.1 Bioassay experiments . . . 71

4.2.2 Statistical model and hypotheses . . . 72

4.2.3 Contrasts . . . 74

4.2.4 The Exponentially Weighted Moving Average . . . 75

4.2.5 α-Spending functions . . . . 76

4.3 Simulation study . . . 76

(11)

4.3.3 Performance measure . . . 77

4.4 Statistical results . . . 78

4.4.1 Simulation results . . . 78

4.4.2 An example . . . 82

4.5 Discussion and conclusion . . . 83

References . . . 85

Chapter 5: Monitoring the bioactivity in the absence of an international reference standard 89 5.1 Introduction . . . 91

5.2 Statistical methods . . . 92

5.2.1 Strategy . . . 92

5.2.2 Statistical details on monitoring the primary and secondary references . . 93

5.3 A simulation study . . . 99

5.3.1 Performance measures . . . 101

5.3.2 Probability of false alarms . . . 102

5.3.3 Probability of detecting a shift . . . 103

5.3.4 Probability of detecting a linear trend . . . 108

5.4 Discussion and conclusion . . . 108

Appendix . . . 111

References . . . 113

Chapter 6: A modified Satterthwaite approach for estimation of one-sided tolerance limits for general mixed effects models 115 6.1 Introduction . . . 117

6.2 Statistical methods . . . 120

6.2.1 Determination of the tolerance factor for tolerance intervals . . . 122

6.2.2 The generalised pivotal quantity tolerance intervals . . . 125

6.2.3 The modified large sample tolerance intervals . . . 126

6.3 Simulation study . . . 126

6.3.1 Two-way nested random effects model . . . 127

6.3.2 Two-way crossed random effects model with interaction . . . 128

6.3.3 Application to bioassay analysis data . . . 130

6.4 Discussion and conclusion . . . 130

References . . . 132

(12)

7.1 Introduction . . . 137

7.2 Methodology . . . 138

7.2.1 Statistical models . . . 138

7.2.2 Shelf life estimation . . . 140

7.2.3 A theoretical comparison of the standard errors used in the estimation of shelf life . . . 141

7.3 A simulation study . . . 145

7.4 Discussion and conclusion . . . 150

Appendix . . . 151

References . . . 154

Chapter 8: Concluding remarks 155 8.1 Estimation of the bioactivity . . . 155

8.2 Monitoring the bioactivity . . . 156

8.3 Estimation of tolerance limits . . . 158

8.4 Estimation of the shelf life of a drug product . . . 159

Summary 160

About the Author 163

(13)

General introduction

1.1

A brief history of animal experiments

The first reported animal experiment was about the discovery of diphtheria antitoxin (Young-dahl, 2010) ∗. Diphtheria is a contagious bacterial infection caused by Corynebacterium diph-theria which mainly affects the nose and throat, leading to breathing difficulties (Markel, 2007; Youngdahl, 2010). In the 19th century, diphtheria infection led to a high mortality rate among children (Markel, 2007)†. According to Youngdahl (2010), a heat-treated diphtheria toxin was used to immunise guinea pigs in 1890 by physicians Dr Emil von Behring and Dr Emile Roux. However, a few years later the two physicians made a significant breakthrough in the fight against diphtheria toxin by discovering a new effective treatment for treating diphtheria (Markel, 2007). In the lead to discovering the diphtheria treatment, Behring and Roux injected diphtheria toxin to healthy horses, then drew at most four blood samples from the injected horses and these blood samples were subsequently stored under freezing conditions. The treatment was the diph-theria antitoxins separated from the refrigerated blood samples (Markel, 2007). The first dose was made available in January 1, 1895 and this treatment proved to be highly efficacious, more so when administered at an early onset of infection (Markel, 2007). The diphtheria efficacious treatment drug finding resulted in a reduction in the number of death rates due to diphtheria bacteria (Markel, 2007; Youngdahl, 2010).

Following this major breakthrough where animals were used to find treatment for humans, several experiments were subsequently performed. The discovery of vitamins is one example of scientific findings that led to a high reduction of deformity and mortality in humans (DeLuca, 2014). For instance, vitamins A to D were all discovered when scientists were investigating

History of vaccines

http://www.historyofvaccines.org/content/blog/early-uses-diphtheria-antitoxin-united-states, Accessed 26-11-2015

The New York Times http://www.nytimes.com/2007/07/10/health/10hors.html?_r=0, Accessed

(14)

appropriate dietary requirements. For example, McCollum and Davis (1913) discovered that an ingredient found in butter fat and cod liver oil promotes growth, prevents xerophthalmia, and eye infection in white rats. Similar experiments (using rats as experimental units) that followed discovered a fat-soluble factor, namely, vitamin A and a water-soluble factor, that is, vitamin B. Vitamin A was responsible for the prevention of xerophthalmia and vitamin B was responsible for the prevention of neurological diseases (DeLuca, 2014). A second water-soluble factor, termed vitamin C was discovered from the same experiment, and this vitamin was responsible for the prevention of scurvy which was more prevalent among sailors (DeLuca, 2014).

One of the mysteries that were still uncovered was the prevention of rickets which was common among Scottish nationals (DeLuca, 2014). An early discovery on the causes of rickets involved an experiment consisting of dogs that were fed food mostly eaten by Scottish people (Hess, 1929). The settings of this particular experiment were that the dogs were deprived of sunlight and these dogs eventually developed rickets (Mellanby, 1919). Later on, McCollum et al. (1922) discovered that vitamin A deficient cod liver oil cured rickets. This led to the discovery of a new vitamin responsible for curing rickets, and it was called vitamin D (DeLuca, 2014; Hess, 1929; McCollum et al., 1922).

A graphical representation of an example of an animal experiment is shown in Figure 1.1. This is a two-arm treatment experiment; where one arm consisted of rats that were injected with cancer cells and these rats are given antagomir treatment, and the other treatment arm is made up of rats that are injected with cancer cells, but are not given any treatment. In this experiment, it was observed that rats that were not given treatment developed more metastasis while those that received antagomir treatment reduced the number of lung metastasis but had no effect on the original tumor cells.

Figure 1.1: A schematic example of a two-arm animal experiment Source: Michele De Palma

& Luigi Naldini, Antagonizing metastasis, Nature Biotechnology 28, 331-332 (2010)

All these experiments were conducted by either physicians, biologists, or bacteriologists. The chemical extraction was performed in laboratories and the response was the observed reaction on

(15)

the experimental unit. There were no strict binding regulations in place at the time to oversee that these experiments are performed in good conduct. Currently, animal experiments are highly regulated by regulatory bodies, such as the United States Pharmacopeia (USP), European Pharmacopeia (EP), Food and Drug Administration (FDA), and International Conference of Harmonisation (ICH). This is mostly done to prevent the misuse of animals, and to maintain the standard and quality of medications. Note that these issues have been a continuous point of discussion from the discovery of diphtheria antitoxin involving horses, where the use of horses as ‘patients’ led to many disputes with the public (Markel, 2007).

1.2

Areas of applications

Bioassays are used in various applications, for example, in the pharmaceutical industry, where the bioactivity can be the main quantity of a medicinal product (Irwin, 1950). They play a part in the development, licensing, manufacturing, stability testing, lot release and marketing of drug products in the pharmaceutical industry (Jeffcoate, 1996; Salvador et al., 2007; USP < 1032 >, 2010; USP < 1033 >, 2010). For example, they are used for cancer drugs or hormonal products. A highly active cancer drug is beneficial for destroying cancer tumor cells, albeit with serious adverse effects (Nastoupil et al., 2012; Rampling et al., 2004) while hormonal products are used in birth control and in fertility products. Even when the bioactivity is not the main characteristic, bioassays are used to quantify toxicity levels of drug products (Govindarajulu, 2001).

Bioassays are also used in applications that are not connected to drug products for hu-mans. For example, bioassays are used in water pollution controls (Mackay et al., 1989), assess-ment of contaminated sediassess-ments (Brils et al., 2000), contaminated soils (Fernández et al., 2010; González et al., 2011), detection of dioxins and dioxin-like compounds in wastes and environment (Takigami et al., 2008), and growth responses of plants (Salvador et al., 2007). Furthermore, bioassays are used in veterinarian studies, for example, Bell et al. (1967) studied the serum and urinary gonadotrophin levels in pregnant donkeys.

1.3

Definition of bioassays

A bioassay is a (scientific) experiment conducted to estimate the biological activity (shortly bioactivity) of an unknown test preparation (Finney, 1978; Govindarajulu, 2001). There are two types of bioassays: direct and indirect. A direct bioassay measures the dose or concentration of a stimulus that is needed to obtain a certain well-defined response, that is, a response that is unambiguous and easily recognised (Finney, 1978). A typical example is the lethal dose in a certain animal experiment. When the stimulus is infused at a fixed rate and the time to death is measured, the dose is the time multiplied with the rate and the response is death. An

(16)

indirect bioassay seeks to estimate equally effective doses of a standard and a test preparation (Finney, 1978). This means that the potency or biological activity of the test preparation, that is, a measure of the biological strength is expressed in dose units of the standard. Indirect bioassays are considered more reliable than direct assays, since direct assays often demonstrate more variability than indirect bioassays (Finney, 1978).

In an indirect bioassay it is common practice that several doses of the standard and test preparation are considered. It is preferred to have equally spaced doses and equal number of doses per preparation (symmetry) for purposes of highest precision (Finney, 1978) but it is not required. The biological response is the reaction observed at the biological unit caused by the application of the doses of the preparations. Based on these results, bioactivity of the test preparation is estimated (see, Section 1.4). Note that the standard preparation is a known and available robust preparation which is not affected by time and which has similar properties to that of a test preparation (Irwin, 1950). These known preparations are often international reference standards and they were created to standardise the bioassays. Available international reference standards are documented in the chapters of the United States Pharmacopeia. Known standards can also be in-house reference standards created from a product (Müller et al., 1996). Bioassays are classified as in vivo or in vitro, where the in vivo bioassays constitute the use of animals, and in vitro bioassays uses biological materials (such as cells) tested on 96 well-plates, Petri dishes or in test tubes (Finney, 1978; USP < 1032 >, 2010; USP < 1033 >, 2010).

Figure 1.2: A schematic example of an in vivo bioassay experiment. Source: Bruno Lunenfeld,

Historical perspectives in gonadotrophin therapy, Human reproduction update, 10(6), 453-467 (2004)

(17)

authors gave a graphical representation of a Steelman-Pohley assay for a gonadotrophin hCG (test preparation) and FSH (Follicle stimulating hormone standard preparation) which is repro-duced in Figure 1.2. In this assay, a group of rats was injected with hCG and the other group with FSH. The ovaries of the rats were harvested and weighed as shown in the scale in Figure 1.2. Based on the scale, the bioactivity is the amount of hCG required to yield ovaries weighing similar to ovaries of rats injected with FSH.

An example of an in vitro bioassay is that of measuring the activity of an influenza virus. Such a bioassay has been studied by Sidwell and Smee (2000) and Van Kessel et al. (2012). In this experiment, a gel is poured on a plate and it is set to harden the gel. Then a dilution series of a standard antigen and a test sample are punched into the plate. An incubation period of 18 hours is allowed and at the end of this incubation period, the plate is washed and the diameter of the formed rings is measured (see Figure 1.3). The larger the diameter the more active the virus is.

Figure 1.3: A schematic example of an in vitro bioassay experiment for influenza. Source:

Van Kessel et al. (2012)

1.4

Calculation of bioactivity

In the early animal experiments, for diphtheria and vitamins, the biological response was binary, that is, quantal bioassays. For the diphtheria, a response was whether a horse did or did not recover and for the case of vitamin D the response was whether the dogs did or did not develop rickets. At the time, there was no involvement of sound statistical analysis and sound statistical calculations only began at the turn of the 20th century. Some of the earliest works were by Irwin (1937) and Bliss and Cattell (1943). The work of Irwin (1937) focused on both quantal

(18)

and quantitative (e.g., ovarian weight) responses. For the quantal bioassays a normal sigmoid dose-response relation was suggested while for the quantitative bioassays a linear regression analysis was considered. Finney (1947) and Finney (1978) presented a general formulation of statistical methods for the estimation of the bioactivity in indirect bioassays. To illustrate the calculations we will describe the linear regression analysis for quantitative bioassays (used in the Steelman-Pohley assay) and probit analysis for quantal bioassays.

1.4.1

Linear regression analysis for quantitative bioassays

Let xij (with i = 1, 2 and j = 1, 2, . . . , Ji) be the jth dose for the ith preparation. We assume

that i = 1 represents the standard preparation and i = 2 the test preparation, Jiis the number of

doses for the ith preparation. Now, let Yijk be the kth biological response (k = 1, 2, . . . , Kij)

measured at dose xij. The most common statistical model for quantitative bioassays from a

historical point of view is the parallel line model (Finney, 1978) given by

Yijk = αi+ βlogxij + εijk, (1.1)

where αi is the intercept of the ith preparation, β is the common slope, and εijk is the residual

assumed to be independent and normally distributed, εijk ∼ N (0, σ2). A graphical display of

the linear dose-response relationship is shown in Figure 1.4.

Figure 1.4: A depiction of a linear dose-response relationship for a standard (solid line)

prepa-ration and a test (dashed line) prepaprepa-ration

(19)

line of the form in (1.1) was applied. In that assay, the biological response was the ovarian weight. The standard preparation was the FHS and the test preparation was the hCG.

If both lines fall on top of each other, the test preparation provides the exact same biological response as the standard preparation. Thus, both preparations must be identical on the applied dose range. This implies that a relative bioactivity is equal to one for the test preparation with respect to the standard. In case the test preparation lies on the right side of the standard preparation, then the test preparation is less potent or weaker than the standard preparation because the test preparation needs higher doses to obtain the same biological response as the standard preparation. If the test preparation falls on the left side of the standard, the test preparation is more potent than the standard. For the parallel line model in (1.1), the potency of the test preparation with respect to the standard preparation is defined as ρ = exp{(α2−α1)/β}.

To estimate the bioactivity, Model (1.1) is fitted to the bioassay data to estimate the param-eters of this model. The estimation of these paramparam-eters is typically performed with ordinary least squares. The estimate of the bioactivity of the test sample is then obtained by substituting the parameter estimates in ρ. It is given by

log ˆρ = αˆ2− ˆα1

ˆ

β . (1.2)

The calculation of the variance of the bioactivity in (1.2) requires the variances and covariances of the parameter estimators ˆα1, ˆα2, and ˆβ. They are given by σ2ν11, σ2ν22, and σ2ν12, respectively,

where σ2 is the residual variance. The constants ν11, ν22, ν12 are known parameters determined

by the design of the parallel line bioassay and they depend on the log doses (Finney, 1978). Using the delta method (Cramér, 1946), the variance of the log bioactivity in (1.2) is approximated by

τ2 = σ

2

β211− 2logρν12+ (logρ) 2ν

22). (1.3)

It can be estimated with S2 by substituting the parameter estimates. The number of degrees

of freedom (df ) is often taken as the number that corresponds to the estimate ˆσ2 from the regression analysis. A confidence interval on the long bioactivity (logρ) is then calculated as log ˆρ ± t−1df (1 − α/2)S with t−1

n (q) the qth quantile of the t-distribution with n degrees of freedom.

An alternative approach to obtain confidence intervals is the approach of Fieller (Fieller, 1944). It is given by XL, XU =  log ˆρ − 12 ν22 ± t −1 df (1 − α/2)ˆσ ˆ β v u u tν11− 2log ˆρν12+ (log ˆρ)2ν22− g ν11ν2 12 ν22 ! /(1 − g),(1.4) where g = (t−1df (1 − α/2))2σˆ2ν

22)/ ˆβ2, XL and XU are lower and upper confidence limits of the

(20)

(i.e., ˆβ  ˆσ2√ν

22), then the confidence interval in (1.4) simplifies to the confidence interval

obtained with the delta method, that is,

XL, XU = " log ˆρ ± t −1 df (1 − α/2)ˆσ ˆ β q ν11− 2log ˆρν12+ (log ˆρ)2ν22 # . (1.5)

1.4.2

Probit analysis for quantal bioassays

In quantal bioassays the response Yijk is binary. One way of describing the dose-response

rela-tionship is to apply

EYijk= Φ(αi+ βlogxij) (1.6)

with Φ the standard normal distribution function and the remaining parameters are as defined in Section 1.4.1. If at each dose xij, the proportion ˆpij = P

Kij

k=1Yijk/Kij is within (0, 1), a

parallel line model can be applied to Φ−1(ˆpij). In this case Model (1.1) may fit with k = 1.

All calculations in Section 1.4.1 may be applied. Although, the parallel line model may be performed as an approximate model which was done in the early days of bioassays, the method of maximum likelihood for the observations Yijk with form (1.6) may be a better approach.

Figure 1.5: A depiction of a parallel probit dose-response relationship for a standard preparation

(solid curve) and a test preparation (dashed curve)

One reason is that proportions ˆp = 0 or ˆp = 1 can still be part of the maximum likelihood

(21)

the proportions are somewhat adjusted then it is possible. Another reason is that the residuals in (1.1) do not follow a normal distribution when applied to Φ−1(ˆpij), although this would only

be a serious issue when Kij is relatively small.

The parallel line model applied to Φ−1(ˆpij) does demonstrate that the potency ρ can still

be calculated as ρ = exp{(α2− α1)/β} for quantal bioassays of the dose-response form in (1.6).

Substituting the ML estimates for α1, α2, and β provides a ML potency estimate. The confidence

interval on logρ is typically calculated with the delta method where t−1df (1 − α/2) is replaced by a normal quantile zα. A graphical representation of the parallel probit models is given in Figure

1.5.

1.4.3

Goodness-of-fit and parallelism

Prior to estimating the bioactivity, several assumptions about dose-response relationship in (1.1) and (1.6) will have to be met. For the quantitative bioassays these are the assumptions commonly made in linear regression and they include linearity, normality of residuals and constant variance (homogeneity assumption) (Kutner et al., 2005). For qualitative bioassays the focus is mostly on the selected dose-response relationship. If any of these assumptions are not met then remedial measures will have to be taken.

In linear regression a transformation of the biological response can be considered to keep the same Model (1.1) for potency estimation. Alternatively, a completely different relationship can be selected. This would also be the case for the quantal bioassays. A general formulation for the response outcome Yijk (or the transformation thereof) is

EYijk = Fηi(αi+ βiψ(xij)), (1.7)

with Fηi a sigmoid curve depending on the parameter ηi, with ψ a transformation of the dose

(such as the logarithmic transformation), and αi and βi an intercept and slope for preparation

i. In general, the relationship (1.7) provides enough flexibility to generate a reasonable

dose-response relationship.

Calculation of the bioactivity also requires that both preparations are parallel or similar. For Model (1.1) and (1.6) this is guaranteed by the slope β that is common to both preparations. In general, in formulation (1.7) this can also be guaranteed, but the set of restrictions on the parameters depend on the choice of ψ (see Chapter 2 for more details). Parallelism would imply that the test preparation is a dilution of the standard or the other way round (with respect to their biological response). This means that the two preparations are biologically the same, they only differ in strength. Parallelism is an important characteristic in the bioassay analysis (USP

(22)

1.5

Motivation

Bioassays play a crucial role in the development, licensing, and marketing of biological products in the pharmaceutical industry. For instance, it is part of clinical trials, optimising processes (process analytical technology), setting product specifications, and quantifying product shelf life. All these areas often encounter product-specific challenges that would require tailor-made statistical methodology for bioassays to work optimally. This means that the bioassay has the highest precision and is reliable.

Statistical methods for bioassays are unique (Jeffcoate, 1996). One reason is the elaborative experimentation that is often needed to obtain just one relative bioactivity and the statistical effort that is put into it to provide an appropriate bioactivity. Given the uniqueness of bioassay data, some of the currently available statistical methodology may not be suitable or have not been adapted for this application. Another reason is that bioassays are mostly indirect contrary to microbiological methods that directly measure the bacterial quantity. The information that is provided from a bioassay experiment, whether the biological response is continuous (e.g., ovarian weight) or binary (e.g., dead/alive), is a relative bioactivity with respect to a standard. The outcome of the bioassay is an estimate of the bioactivity, its precision and a number of degrees of freedom for the precision. Most measurement systems only provide a value without a standard error and a degree of freedom.

This necessitates more research work to fully evaluate, understand, and develop new method-ology appropriate for the bioassay application at hand. This thesis will focus on two parts of statistical methodology for biological assays, which are strongly interrelated. The first part is on the estimation of the bioactivity and the second part is on the estimation of product quality. The equations described in Section 1.4 are based on one bioassay experiment or one bioassay run. However, it is common practice to perform more than one bioassay run. These multiple bioassay runs are performed to improve the estimation of the relative bioactivity. The resulting outcome from each bioassay run is a tripartite of the relative potency, stan-dard error, and its degrees of freedom. If H bioassay runs are performed, then there will be (X1, S1, df1), (X2, S2, df2), . . . , (XH, SH, dfH) sets of parameter estimates. These sets of

estimates are used to estimate a pooled or combined bioactivity estimate where the variability between estimates (Xh) and within estimates (Sh) are taken into account. For the estimation

of the bioactivity, a statistical method that is highly precise when combining bioassay estimates is still not known. Another issue in the bioassay analysis is that official guidelines require the similarity assumption to be assessed using an equivalence hypothesis, but the feasibility of this approach has not been appropriately evaluated in the context of bioassays.

The second part of the thesis focuses on several topics of the product quality where the quality of the product is mainly measured using the bioactivity. Statistical methodology for

(23)

monitoring the bioactivity of a known in-house preparation with respect to an international reference standard or monitoring the bioactivity of a known in-house preparation when an inter-national reference standard is not available is very limited or at most nonexistent. This means that new methodology is required to enable the monitoring of the bioactivity and this methodol-ogy should take into account the bioassay data structure. Monitoring the standard preparations help guarantee the quality of drug products that are released to the market. Another quality related aspect of a drug product is the determination of specification limits.

The characteristics of drug products manufactured by pharmaceutical companies are ex-pected to lie within a certain range, and this is commonly known as specification limits. These limits are often set using tolerance limits of the bioactivity. Statistical methods that are cur-rently available were specifically developed for certain settings and as a result these statistical methods are not applicable to more general settings or more complex designs that are common for bioassays. The last topic on product quality investigates the estimation of shelf life. This is the time in which the product will remain biologically active. The estimation of the shelf life of a product is well documented by the ICH. The proposed method does not take into account the design of the bioassay. For example, the standard error and degrees of freedom are not accounted for in the shelf life estimation nor is the design structure that would be imposed by bioassay runs. This implies that there is a higher likelihood of imprecisely estimating the shelf life, especially when there is significant variability between bioactivity estimates.

1.6

Aim of the thesis

The specific aims of the thesis with more detailed explanation on the issues regarding bioactivity estimation and product quality are given in Section 1.6.1 and Section 1.6.2, respectively.

1.6.1

Estimation of bioactivity

For the relative bioactivity to be fit-for-use implies that well-designed and well-executed bioas-say runs must be conducted. The relative bioactivity is estimated with an appropriate model fitting the data. Regulatory recommendations require that the test preparation (or test sample) behaves similar to the standard preparation (or known sample). The test preparation is then considered as a dilution of the standard preparation (see Section 1.4.2). In practice, evaluation of similarity is performed by assessing the traditional null hypothesis of similarity or by the hypoth-esis of equivalence (Callahan and Sajjadi, 2003; Gottschalk and Dunn, 2005; Hauck et al., 2005; Yellowlees et al., 2013). Regulators recommend that the equivalence hypothesis is used over the traditional hypothesis. The equivalence approach may accept a small but significant violation of the traditional null hypothesis on similarity, while the traditional hypothesis should not reject

(24)

similarity between the two preparations. In Chapter 2, an equivalence hypothesis formulated on the relative bioactivity instead of the model parameters is proposed. The consequences of this hypothesis are critically evaluated and compared with the traditional hypothesis.

Two practical situations may lead to pooled or combined bioactivities from multiple individ-ual bioactivities. The first situation is when a bioactivity intended as a reportable value is not precise enough. Multiple bioassay runs are then conducted to improve precision. The second situation is when a bioassay run consists of multiple bioactivities to determine a reportable value. This is often the case in the in vitro bioassays with multiple well-plates and each plate provides one bioactivity. Combining or pooling information from different experiments is referred to as meta-analysis (Glass, 1976). Meta-analysis is strongly associated with medical sciences and not so much with bioassays, although pooling bioactivities has a longer history and originated from bioassays and social sciences. In meta-analysis, a parameter of interest is typically taken from a (linear or logistic) regression model, such as an overall effect of a drug taken from different studies to improve the effect size. For bioassays, the relative bioactivity may not just be one parameter but a function of several model parameters, like in the parallel line model (see (1.2)). Combining estimates from different studies is often complicated by the fact that estimates tend to be highly heterogeneous. This implies that the variability between estimates is larger than the precision of individual estimates (Cochran, 1954; Finney, 1978). Thus, applying statis-tical models for pooling estimates must account for sources of variation between studies. Failing to account for these sources of variation may result in the underestimation of the standard error of the pooled estimate. For bioassays, underestimation may lead to false acceptance of a drug product. Between bioassay variability is due to differences in conditions from conducting several experiments and the these conditions affect the bioactivities.

Many approaches for combining bioactivities have been developed long before meta-analysis became famous. These include the methods of moments, simple averages, and likelihood ap-proaches (Bliss, 1952; Cochran, 1954; Jeffcoate, 1996; Laska and Meisner, 1987; Meisner et al., 1986; Morse and Bickle, 1967). In meta-analysis other approaches have been developed. These include mixed effects (Searle et al., 1992; Searle, 1971) and profile likelihood (Hardy and Thomp-son, 1996) approaches. However, it is unknown how these methods perform on pooling bioac-tivities. Even older approaches have never been compared in full. In Chapter 3, all these approaches are assessed to determine the most optimal and efficient approach to employ when combining bioactivity estimates and the effects of heterogeneity and sample sizes are assessed. The results have been published in the Pharmaceutical Statistics journal (Mzolo et al., 2013).

1.6.2

Estimation of product quality

The relative bioactivity of a product is estimated relative to a known standard preparation. The international reference standard tends to be hugely expensive for routine purposes, and to

(25)

minimise costs, pharmaceutical companies may develop their own in-house reference standard. The bioactivity of the in-house reference standard is tested against the international reference standard to quantify its bioactivity. The consequence of using an in-house reference is that the pharmaceutical company is obliged to assess the biological stability of this in-house reference standard by monitoring its bioactivity during the period of its use. For example, this period could be five years, and the tests to evaluate the stability of this bioactivity could be performed annually. If it is found that the bioactivity of the in-house reference standard is not stable, then a new in-house reference standard should be created and tested against the international reference standard.

Several statistical methods are available for monitoring the quality of products and these include the exponential weighted moving average (EWMA) (Roberts, 1959), cumulative sum (CUSUM) (Page, 1954) control charts, and Q charts, and these are commonly used in engineering applications. Monitoring bioactivities of an in-house standard using these control charts has two complications. Firstly, the frequency and the duration period for in-house standards lead to a limited number of follow-up points which implies that average run lengths, which are used in control charts, are not useful for evaluating the bioactivities. Secondly, the data at each follow-up point may consist of a set of heterogenous bioactivities accompanied by standard errors and degrees of freedom. Incorporating this additional information in the monitoring is unknown.

Chapter 4 of this thesis attempts to address these two issues for monitoring the in-house

reference standard by adjusting the existing methods to this application. Different follow-up schemes are also investigated to assess how the power of detecting changes in the bioactivity is influenced. The chapter has been published in the Statistics in Biopharmaceutical Research journal (Mzolo et al., 2015).

For some products, an international reference standard does not exist and in such cases, a new approach is needed to be able to create and monitor the bioactivity of an in-house reference standard. In Chapter 5 of this thesis, a strategy together with a testing scheme is devised to enable the pharmaceutical company to create and monitor the stability of relative bioactivities. This strategy involves creating a primary and secondary reference standards, where the former replaces the international reference standard and the latter replaces an in-house reference standard. The stability of the primary reference is now the responsibility of the pharmaceutical company and it is of great significance because this reference is used to qualify secondary references. Thus, both the primary and secondary standard are to be controlled and monitored. The proposed strategy is used to guarantee stability in the standards and assess the longevity of the primary reference. In this thesis, optimal parameters of an EWMA control chart used for monitoring the bioactivity of the primary reference are determined.

The bioactivity of a drug product is expected to lie within a particular range called specifi-cation limits. The specifispecifi-cation limits can be determined by estimating tolerance limits of the

(26)

relative bioactivity. Currently, a number of statistical approaches are available for estimating both one- and two-sided tolerance limits. However, these statistical methodologies are developed for simple statistical models and not easily generalisable to more complex designs for higher or-der analysis of variance (ANOVA) models. The bioactivity can be typically affected by a number of variation sources which need to be included in the tolerance limits if specifications are to be set realistically. In Chapter 6, a flexible approach for determining one-sided tolerance limits applicable to any variance component model is proposed.

The final research chapter of this thesis focuses on the estimation of the shelf life of drug products using bioassays. In order to estimate the shelf life of a drug product, batches of products produced by pharmaceutical companies are subjected to different storage conditions. Stability studies enable the determination of the shelf life of the drug product under these conditions. Any product will degrade but the degradation rates may vary with storage conditions over time. In literature so far, the effect of bioassay runs has been ignored in the estimation of shelf lives. Consequently, the bioassay run variability, bioactivity precision, and degrees of freedom are not accounted for. In Chapter 7 of this thesis, the enhanced statistical analysis for two experimental designs are compared for estimating shelf life.

1.7

Scope of the thesis

The thesis is organised as follows: the parallelism of the test and standard sample is introduced in Chapter 2. In this chapter, two types of testing procedures are introduced and these are compared using case studies and simulations. Methods for estimating a relative bioactivity are introduced in Chapter 3. The efficiency and precision of these method is fully examined by means of simulations. In Chapter 4, methods for monitoring an in-house reference standard are discussed. These include methods that are used in dose-finding studies and their applicability on biological assays is assessed. However, some products do not have an international or standard reference standard which enables the estimation of bioactivities. As a result, a scheme and testing strategy is introduced where both the primary (standard) and secondary (test) references are monitored through time and this is covered in Chapter 5. Chapter 6 introduces a method for setting up specification limits using tolerance limits. The designs of stability degradation studies are introduced in Chapter 7. The designs enable a precise estimation of the shelf life. The last chapter of the thesis summarises the conclusions and the prospective further research.

(27)

References

Bell, E. T., Loraine, J. A., Jennings, S., and Weaver, A. D. (1967), “Serum and Urinary Go-nadotrophin Levels in Pregnant Ponies and Donkeys,” Quartely Journal of Experimental

Phys-iology, 52, 68–75.

Bliss, C. I. (1952), The Statistics of Bioassay, New York: Academic Press Inc.

Bliss, C. I. and Cattell, M. (1943), “Biological Assay,” Annual Review of Physiology, 5, 479–539. Brils, J., Stronkhorst, J., and Van De Guchte, K. (2000), “The Status and Use of Bioassays for the Assessment of Contaminated Sediments in the Netherlands,” In Workshop Report, 12–16. Callahan, J. and Sajjadi, N. (2003), “Testing the Null Hypothesis for a Specified Difference: The

Right Way to Test for Parallelism,” Bioprocessing Journal, 2, 71–78.

Cochran, W. G. (1954), “The Combination of Estimates from Different Experiments,”

Biomet-rics, 10, 101–129.

Cramér, H. (1946), Mathematical Methods of Statistics., Princeton, NJ: Princeton University Press.

DeLuca, H. F. (2014), “History of the Discovery of Vitamin D and Its Active Metabolites.”

BoneKEy Reports, 3, 479.

Fernández, M. D., Babín, M., and Tarazona, J. V. (2010), “Application of Bioassays for the Ecotoxicity Assessment of Contaminated Soils,” Methods in Molecular Biology (Clifton, N.J.), 599, 235–62.

Fieller, E. C. (1944), “A Fundamental Formula in the Statistics of Biological Assay, and Some Applications,” Quarterly Journal of Pharmacy and Pharmacology, 17, 117–123.

Finney, D. J. (1947), “The Principles of Biological Assay,” Supplement to the Journal of the

Royal Statistical Society, 9, 46–91.

— (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Glass, G. V. (1976), “Primary, Secondary and Meta-analysis Research,” Educational Researcher, 10, 3–8.

González, V., Díez-Ortiz, M., Simón, M., and van Gestel, C. A. M. (2011), “Application of Bioassays with Enchytraeus Crypticus and Folsomia Candida to Evaluate the Toxicity of a Metal-contaminated Soil, Before and After Remediation,” Journal of Soils and Sediments, 11, 1199–1208.

(28)

Gottschalk, P. G. and Dunn, J. R. (2005), “Measuring Parallelism, Linearity, and Relative Potency in Bioassay and Immunoassay Data,” Journal of Biopharmaceutical Statistics, 15, 437–63.

Govindarajulu, Z. (2001), Statistical Techniques in Bioassay, Basel: Karger, 2nd ed.

Hardy, R. J. and Thompson, S. G. (1996), “A Likelihood Approach to Meta-analysis With Random Effects,” Statistics in Medicine, 15, 619–629.

Hauck, W., Capen, R., Callahan, J., De Muth, J. E., Hsu, H., Lansky, D., Sajjadi, N., Seaver, S., Singer, R. R., and Weisman, D. (2005), “Assessing Parallelism Prior to Determining Relative Potency,” PDA Journal of Pharmaceutical Science and Technology, 59, 127–137.

Hess, A. (1929), “The History of Rickets,” in Rickets, Including Osteomalacia and Tetany, Philadelphia: Lea & Febiger, pp. 22–37.

Irwin, J. O. (1937), “Statistical Method Applied to Biological Assays,” Royal Statistical Society, 4, 1–60.

— (1950), “Biological Assays with Special Reference to Biological Standards,” The Journal of

Hygiene, 48, 215–238.

Jeffcoate, S. (1996), “The Role of Bioassays in the Development, Licensing and Batch Control of Biotherapeutics,” Trends in Biotechnology, 14, 121–124.

Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005), Applied Linear Statistical Models, New York: McGraw-Hill/Irwin, 5th ed.

Laska, E. M. and Meisner, M. J. (1987), “Statistical Methods and Applications of Bioassay,”

Annual Review of Pharmacology and Toxicology, 27, 385–97.

Lunenfeld, B. (2004), “Historical Perspectives in Gonadotrophin Therapy,” Human Reproduction

Update, 10, 453–467.

Mackay, D. W., Holmes, P. J., and Redshaw, C. J. (1989), “The Application of Bioassay Tech-niques to Water Pollution Problems - The United Kingdom Experience,” Hydrobiologia, 188-189, 77–86.

Markel, H. (2007), Long Ago Against Diphtheria, the Heroes were Horses (accessed 2015/11/26), vol. 1, New York: The New York Times.

McCollum, E. V. and Davis, M. (1913), “The Necessity of Certain Lipins in the Diet During Growth,” Journal of Biological Chemistry, 25, 167–175.

(29)

McCollum, E. V., Simmonds, N., Becker, J. E., and Shipley, P. G. (1922), “An Experimental Demonstration of the Existence of a Vitamin Which Promotes Calcium Deposition,” Journal

of Biological Chemistry, 53, 293–298.

Meisner, M., Kushner, H. B., and Laska, E. M. (1986), “Multivariate Combining Bioassays,”

Biometrics, 42, 421–427.

Mellanby, E. (1919), “An Experimental Investigation on Rickets,” Lancet, 1, 407–412.

Morse, P. M. and Bickle, A. (1967), “The Combination of Estimates from Similar Experiments, Allowing for Inter-experiment,” Journal of the American Statistical Association, 62, 241–250. Müller, K. M., Gempeler, M. R., Scheiwe, M. W., and Zeugin, B. T. (1996), “Quality Assurance for Biopharmaceuticals: An Overview of Regulations, Methods and Problems,” Pharmaceutica

Acta Helvetiae, 71, 421–438.

Mzolo, T., Goris, G., Talens, E., Di Bucchianico, A., and Van den Heuvel, E. (2015), “Statis-tical Process Control Methods for Monitoring In-house Reference Standards,” Statistics in

Biopharmaceutical Research, 7, 55–65.

Mzolo, T., Hendriks, M., and Van den Heuvel, E. (2013), “A Comparison of Statistical Methods for Combining Relative Bioactivities from Parallel Line Bioassays,” Pharmaceutical Statistics, 12, 375–384.

Nastoupil, L. J., Rose, A. C., and Flowers, C. R. (2012), “Diffuse Large B-cell Lymphoma: Current Treatment Approaches,” Oncology, 26, 488–95.

Page, E. (1954), “Continuous Inspection Schemes,” Biometrika, 41, 100–115.

Rampling, R., James, A., and Papanastassiou, V. (2004), “The Present and Future Management of Malignant Brain Tumours: Surgery, Radiotherapy, Chemotherapy,” Journal of Neurology,

Neurosurgery and Psychiatry, 75, ii24–ii30.

Roberts, S. (1959), “Control Charts Tests Based on Geometric Moving Averages,”

Technomet-rics, 1, 239–250.

Salvador, J.-P., Adrian, J., Galve, R., Pinacho, D. G., Kreuzer, M., Sánchez-Baeza, F., and Marco, M.-P. (2007), “Chapter 2.8 Application of Bioassays/Biosensors for the Analysis of Pharmaceuticals in Environmental Samples,” Comprehensive Analytical Chemistry, 50, 279– 334.

Searle, S., Casella, G., and McCulloch, C. (1992), Variance Components, New Jersey: John Wiley & Sons, Inc.

(30)

Searle, S. R. (1971), Linear Models, New York: John Wiley & Sons.

Sidwell, R. W. and Smee, D. F. (2000), “In Vitro and In Vivo Assay Systems for Study of Influenza Virus Inhibitors.” Antiviral Research, 48, 1–16.

Takigami, H., Suzuki, G., and Sakai, S. (2008), “Application of Bioassays for the Detection of Dioxins and Dioxin-like Compounds in Wastes and the Environment,” Interdisciplinary

Studies on Environmental Chemistry-biological Responses to Chemical Pollutants, 87–94.

USP < 1030 > (2010), “Biological Assay Chapters-Overview and Glossary,” Tech. rep., United States Pharmacopeia.

USP < 1032 > (2010), “Design and Development of Biological Assays,” Tech. rep., United States Pharmacopeia.

USP < 1033 > (2010), “Biological Assay Validation,” Tech. rep., United States Pharmacopeia. USP < 1034 > (2010), “Analysis of Biological Assays,” Tech. rep., United States Pharmacopeia. Van Kessel, G., Geels, M. J., De Weerd, S., Buijs, L. J., De Bruijni, M. A. M., Glansbeek, H. L., Van den Bosch, J. F., Heldens, J. G., and Van den Heuvel, E. R. (2012), “Development and Qualification of the Parallel Line Model for the Estimation of Human Influenza Haemagglu-tinin Content Using the Single Radial Immunodiffusion Assay,” Vaccine, 30, 201–209.

Yellowlees, A., Lebutt, C. S., Hirst, K. J., Fusco, P. C., and Fleetwood, K. J. (2013), “Efficient Analysis of Dose-time-Response Assays,” BioScience, 63, 490–498.

Youngdahl, K. (2010), Early Uses of Diphtheria Antitoxin in the United States (accessed

(31)

Equivalence testing for similarity in bioassays: A

critical note

(32)

Abstract

Similarity in bioassays means that the test preparation behaves as a dilution of the standard preparation with respect to their biological effect. Thus, similarity must be investigated to confirm this biological property. Historically, this was typically conducted with a traditional hy-pothesis testing, but this approach has received substantial criticism. Failing to reject similarity does not imply that the two preparations are similar. Also, rejecting similarity when the bioassay variability is small, might simply demonstrate a non-relevant deviation in similarity. To remedy these concerns, equivalence testing has been proposed as an alternative to traditional hypothesis testing and it has found its way in the official guidelines. However, the consequences of equiva-lence testing for similarity on the relative bioactivity of the test preparation have not been fully investigated. This chapter provides a general framework on equivalence that is directly related to the relative bioactivity. It is demonstrated that non-similarity can never imply equivalence on the relative bioactivity in general, but only on a finite interval for the dose range. Additionally, several case studies show that reasonable finite dose ranges lead to unrealistic numbers of test units required to demonstrate bioequivalence of a test preparation in the bioassay. Although our general framework is theoretically appropriate towards equivalence testing for similarity, we argue that it might be too impractical to execute.

Keywords: relative bioactivity; slope-ratio; S-shaped response curve; quantal bioassays;

(33)

2.1

Introduction

The main objective in bioassay analysis is to estimate the relative potency or bioactivity of a test preparation with respect to a reference or standard preparation (Finney, 1978). The relative bioactivity is the ratio of dose xS of the standard preparation and dose xT of the test

preparation that would generate the same predefined biological effect y (e.g., y = 10% adverse events in mice). This ratio ρ (xS) may vary with the predefined biological effect y and dose

xS through the inverse dose-response relationship for the standard preparation. In the special

case that the relative bioactivity is independent of dose xS, that is, ρ (xS) ≡ ρ0 is constant,

the test preparation is or behaves as a dilution of the standard preparation or the other way around. This means that the standard and test preparations are biologically similar, since they only differ in concentration and not in their biological response (Finney, 1978). This biological condition is referred to as parallelism or similarity.

The assumption of similarity is almost always statistically tested in the bioassay analysis since its violation could imply that the biological effect of the test preparation administered at doses different from the doses used in the bioassay cannot be predicted from the standard preparation. Medicinal products are often tested at substantially lower doses in the bioassay than the intended doses administered to human subjects. Thus in the most extreme case, non-similarity observed in the bioassay analysis could imply that the test preparation is biologically harmful or not effective at all at the intended doses of the test preparation. We used the words “could imply” because violation of similarity may also be caused by artifacts or design issues in the bioassay. On the other hand, failing to reject similarity in a bioassay analysis does not prove that the test preparation is similar to the standard preparation. Large assay variation will tend to mask non-similarity, while small assay variation may detect irrelevant non-similarity. These issues with similarity have been used by Callahan and Sajjadi (2003) and Hauck et al. (2005) as arguments to introduce equivalence testing on similarity in bioassay analysis as a replacement of the commonly used traditional hypothesis testing.

The equivalence testing approach on similarity has been adopted by the United States Phar-macopeia for bioassay development and validation (USP < 1032 >; USP < 1033 >) and bioassay analysis (USP < 1034 >). It requires a predefined criterion on one or several parameters that are used in the dose-response relationships for the test and standard preparations. For instance, in a parallel line assay, an equivalence criterion is needed either on the difference of the slopes or on the ratio of the slopes for the two preparations (Hauck et al., 2005). This calculation procedure resembles, for instance, equivalence testing on the risk difference or risk ratio in a logistic regression analysis of a binary clinical outcome in clinical trials (Dann and Koch, 2008; Julious and Owen, 2011).

(34)

the concept has not been discussed critically enough in literature. For instance, equivalence testing for similarity is not the same as equivalence testing for treatment effects in clinical trials. The latter investigates equivalence of a clinical outcome for two treatments both administered at just one dose, while the former essentially investigates equivalence of a clinical outcome for two treatments at all doses. More specifically, the consequences of equivalence testing for similarity on the size of the relative bioactivity of the test preparation with respect to the standard at the intended doses have never been investigated, nor is the predefined criterion used for the dose-response parameters based on any clinical relevance. Thus, the purpose of this chapter is to formulate equivalence testing of similarity in terms of the relative bioactivity and then discuss its feasibility in quantal and quantitative bioassays.

The chapter is organised as follows. We start with a general formulation of dose-response curves for bioassays and provide expressions for the relative bioactivity for parallel line, slope-ratio, two, three, four, and five parameter logistic dose-response curves in the next section. The third section describes traditional hypothesis testing and equivalence testing for similarity. The fourth section illustrates the testing methods on four different case studies and discusses sample sizes for equivalence testing. The final section is a critical discussion of the results and the practical limitations. A manuscript based on this chapter has been submitted for publication.

2.2

Statistical methodology

Let Yijkbe the kth (possibly transformed) biological outcome at the jth dose for preparation i in

a bioassay analysis, with i = 1, 2, j = 1, 2, . . . , mi, and k = 1, 2, . . . , nij. Preparation i = 1

will represent the standard preparation and preparation i = 2 will represent the test preparation. The outcome Yijk can be either continuous (quantitative bioassays), discrete (count bioassays),

or binary (quantal bioassays). The design of the bioassay is fully determined by the selection of doses and how test units (e.g., animals or wells on a well-plate) are allocated to treatments (e.g., the doses of preparations). The number of doses will typically be the same for both preparations (mi = m) and the number of replicates the same for each dose (nij = n), but this is not a

requirement. We will also assume that test units are randomly assigned to treatments.

A general formulation for the expected biological outcome of any of the three types of bioas-says can be described by

EYijk= δi+ (γi− δi) Fηi(αi+ βiψ (xij)) , (2.1)

with Fη a known monotone increasing or decreasing function, parameterised by η = (η1, η2, . . . , ηp),

zij = ψ (xij) the selected dose metameter (i.e., a transformation of dose xij), and

(35)

slope βi can be selected positive, since an increasing or decreasing dose-response relationship is

fully determined by the choice of the function Fη. Note that the function Fη can also be free

from any parameter (i.e., p = 0). In case the function Fη is a distribution function, the

param-eters δi and γi represent a lower and upper asymptote, respectively. Finally, the dose-response

relationship (2.1) is a very general formulation since it contains all dose-response relationships that are commonly used in bioassay analysis.

2.2.1

Dose-response relationships

The parallel line model is obtained by taking the dose metameter zij = ψ (xij) = log xij, with

the log function taken as the natural logarithm, the parameters δi = 1 − γi = 0, and the function

Fη(z) = z the identity function. Relation (2.1) reduces to

EYijk = αi+ βizij. (2.2)

The slope-ratio model has the same form as (2.2), but is obtained by choosing dose metameter

zij = xij, αi = 1 − βi = 0, and function Fη(z) = zη. The intercept αi and slope βi in (2.2) are

then replaced by the parameters δi and γi − δi in (2.1), respectively. Note that the choices for

the slope-ratio assay may result in a dose power ηi depending on preparation i, while choices

zij = xηij, δi = 1 − γi = 0, and Fη(z) = z lead to a power η that must be known and independent

of preparation, since the dose metameter is a known transformation that is applied to all doses, irrespective of preparation.

In many bioassays a linear relationship (2.2) in dose metameter zij = ψ (xij) is often only

approximately true (Finney, 1978; Volund, 1978). On the whole domain R≥0, the dose-response

relationship has often a sigmoid form or shape. For quantal bioassays the two-parameter probit dose-response relationship is given by

EYijk = Φ (αi+ βilog (xij)) , (2.3)

with Φ the standard normal distribution function. The intercept αi is related to the well-known

ED50, the dose concentration that gives an event probability of 50%, that is, ED50= exp (αi).

An alternative model for quantal bioassays is to use the logistic distribution function F (z) = exp (z) /(1 + exp (z)) for the normal distribution function Φ in (2.3).

The relationship in (2.3) suggests no biological events for a blank dose (when βi > 0), but

this is not guaranteed in all quantal bioassays. In the context of virus bioassays (Ridout et al., 1993) and microbiology (IJzerman-Boon and Van den Heuvel, 2015), the following dose-response

(36)

relationship has been proposed

EYijk = 1 − (1 − δi) exp (−βixij) , (2.4)

with δi ∈ [0, 1). The relationship in (2.4) for quantal bioassays is often referred to as the

complementary log-log dose-response curve. It fits the formulation in (2.1) by either choosing

zij = ψ (xij) = xij, αi = 1 − γi = 0, and Fη(z) = 1 − exp (−z) the exponential distribution

function, or zij = ψ (xij) = log xij, βi = γi = 1, and Fη(z) = 1 − exp (− exp (z)) the double

exponential distribution function. For the second formulation, βi in (2.4) is equal to exp (αi)

and different from the βi defined by (2.1).

The range of expected biological responses in quantitative bioassays is typically different from the range [0, 1] for quantal bioassays. This implies that to obtain flexibility in the range of expected outcomes, the parameters γi and δi cannot be restricted anymore. However, choices

for ψ and Fη in quantitative bioassays may be similar to quantal bioassays. One particular

dose-response relationship that is applied to quantitative bioassays more than to quantal bioassays is the five parameter logistic curve

EYijk = δi+ (γi− δi) [1 + exp (−αi− βilog (xij))]

−ηi, (2.5)

with γi, δi ∈ R and ηi > 0. The power parameter η induces an asymmetric logistic

dose-response relationship, which means that it models the curve below and above the ED50differently

(Gottschalk and Dunn, 2005). In case the power is equal to one (ηi = 1), relationship (2.5)

reduces to a symmetric four-parameter logistic curve. It should be noted that Ricketts and Head (1999) discussed another type of five-parameter curve, which falls outside our general formulation (2.1) of dose-response relationships and is outside the scope of this thesis.

2.2.2

Relative bioactivities

The relative bioactivity is defined by the ratio ρ (xS) of doses that makes the expected biological

response of the test preparation at dose xT = ρ (xS) xS equal to the expected biological response

of the standard preparation at dose xS (Finney, 1978). Note that in our formulation, a relative

bioactivity larger than one (ρ (xS) > 1) makes the standard more potent than the test

prepara-tion, because the test preparation requires higher doses than the standard preparation to obtain the same biological response. In terms of our general formulation (2.1), the relative bioactivity

ρ (xS) should satisfy the equality

(37)

Under the assumption that both the dose metameter ψ and function Fη have an inverse function,

the solution for the relative bioactivity ρ (xS) at dose xS is

ρ (xS) = x−1S ψ −1 β2−1hFη−12 δ1−δ2 γ2−δ2 + γ1−δ1 γ2−δ211+ β1ψ (xS))  − α2 i . (2.6) This relative bioactivity reduces in complexity when the following parameter restrictions are implemented: γ1 = γ2, δ1 = δ2, and η1 = η2. These assumptions make the minimum and

maximum expected biological responses the same for the two preparations and also make the dose-response relationship Fη identical for both preparations. The relative bioactivity is then

ρ (xS) = x−1S ψ

−1

([α1− α2+ β1ψ (xS)] /β2) . (2.7)

Thus, the relative bioactivity is now independent of function Fη and the parameters γ1, γ2, δ1, δ2, η1,

and η2, but it still depends on dose metameter ψ and parameters α1, α2, β1, and β2.

The relative bioactivity ρ (xS) for the parallel line Model (2.2), the quantal bioassay (2.3),

and the quantitative bioassay (2.5) are all equal to

ρ (xS) = exp {(β12− 1) log (xS)} exp {(α1− α2) /β2} ,

when γ1 = γ2, δ1 = δ2, and η1 = η2. This relative bioactivity always holds true for the parallel

line model and the two-parameter probit and logit models when the same sigmoid dose-response relationships for both preparations are fitted (η1 = η2). For the slope-ratio Model (2.2), the

relative bioactivity ρ (xS) is different due to the different dose metameter. In terms of the

parameters in (2.2), it is given by

ρ (xS) = [(α1− α2+ β1xηS)/(β2xηS)]1/η,

when η1 = η2 = η with the dose metameter in (2.2) taken as ψ (x) = xη. Since the quantal

bioassay in (2.4) can be obtained in two different ways from formulation (2.1), using either a dose metameter ψ (x) = log (x) or ψ (x) = x, and the relative bioactivity ρ (xS) in (2.7)

depends only on dose metameter (when δ1 = δ2), the set of parameters in (2.1) to generate (2.4)

satisfies a specific constraint (αi = 0 or βi = 1) that would make the relative bioactivity dose

independent. Indeed, the formulation with the exponential distribution has relative bioactivity

ρ (xS) = β12, while the formulation with the double exponential distribution, has relative

bioactivity ρ (xS) = exp {α1− α2}. These relative bioactivities are of course identical, since the

Referenties

GERELATEERDE DOCUMENTEN

The aim of this research was to determine baseline data for carcass yields, physical quality, mineral composition, sensory profile, and the optimum post-mortem ageing period

The reproducibility of retention data on hydrocarbon Cu- stationary phase coated on soda lime glass capillary columns was systematically st udred For mixtures of

Two different military traditions, both grounded in French theory, arose within the U.S Army: the doctrine of the Board based on the ‘modern tactics of manoeuvre and

At the third institute, the SD seems to be successful in motivating his colleagues to embrace the integration-and-learning perspective and actively support diversity. He is not

However, apart from the traditional problems faced by the black workers in this case males at the industry at that time, there was another thorny issue as provided in section

• great participation by teachers and departmental heads in drafting school policy, formulating the aims and objectives of their departments and selecting text-books. 5.2

Neethling van Stellenbosch in die vyftigerjare van die vorige feu die Transvaalse gemeentes besoek en aan die hand gedoen dat op die plek waar Middelburg tans gelee is, 'n dorp

• In systeem hoog compost + runderdrijfmest vóór aardappel, digestaat van varkensdrijfmest vóór biet en maïs en runderdrijfmest vóór prei • Rijenbemesting