• No results found

Ligand-Protein binding free energy calculations of CYP450 BM3 mutants with testosterone to assess the stereoselectivity of hydroxylation upon mutation

N/A
N/A
Protected

Academic year: 2021

Share "Ligand-Protein binding free energy calculations of CYP450 BM3 mutants with testosterone to assess the stereoselectivity of hydroxylation upon mutation"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MSc

Chemistry

Molecular Sciences

Research Minor Project

Free energy calculations on

testosterone hydroxilation by

CYP450 mutants; testing AEDS and

XTI in the GROMOS environment.

by

Xavier

Fernandez-Luengo

Flores

11445203

August 2019

24 EC

March-August

2019

Supervisor/Examiner:

Examiner:

Daan Geerke, PhD

Jocelyne Vreede, PhD

Rosa Luirink, PhD Student

(2)
(3)

iii

UNIVERSITEIT VAN AMSTERDAM

Abstract

MSc in Chemistry

by Xavier FERNANDEZ-LUENGOFLORES

The Cytochromes P450 (CYP450) constitute a significant group of oxidative enzymes that can introduce an oxygen atom with high regio and stereose-lectivity. Upon directed evolution of the bacterial CYP450 of Bacillus mega-terium, CYP450 BM3, mutants with a very high turnover ratio are obtained. The BM3 mutant M11 is able to metabolize a wide range of drugs and drug-like compounds. Upon another cycle of mutations, the M11 V87I is able to hydroxylate testosterone on the 16th carbon with a highly preferred stereo-chemistry (β-hydroxylation). Interestingly enough, one additional mutation in the active site (S72I), inverts the stereochemistry of the reaction, resulting in hydroxylation of testosterone at the 16th carbon giving the α hydroxylated product. The goal of this project is to use computer simulation methods to calculate the binding free energies differences between hydroxylation reac-tions via extended thermodynamic integration (XTI), thermodynamic inte-gration (TI) and accelerated enveloping distribution sampling (AEDS) using the GROMOS software and try to elucidate if this change in stereoselectivity upon mutation of a single residue within the protein active site could have been predicted retrospectively.

(4)
(5)

v

Acknowledgements

Special thanks to D. Geerke and R. Luirink, for being amazing supervisors during this project in the good and bad times. They made me feel welcome in their group and helped me improve as a scientist.

To Koen, Mark and Eko for amazing climbing sessions, plan B’s, random lunch conversations, lots of laughs and keeping my sanity on acceptable lev-els.

To all the MOLTOX people that made this cold, cold office a colorful and warmer place (of course, in the metaphorical sense): Paul, Sandra, Giada, Liliana, Vidya, Ruthmila, Ellen, Anja and Cormac.

To Yannick Bisschops, for being an amazing and supportive friend during this second part of my stay in the Netherlands. I will miss you, mate.

(6)
(7)

vii

Contents

Abstract iii Acknowledgements v 1 Introduction 1 1.1 Contextualization . . . 2 1.2 Objectives . . . 4 2 Methodology 7 2.1 Theoretical background . . . 7

2.1.1 Extended thermodynamic integration . . . 8

2.1.2 Accelerated enveloping distribution sampling . . . 11

2.2 Simulation work-frame . . . 15

2.2.1 XTI methodology . . . 17

2.2.2 AEDS methodology . . . 19

3 Results and discussion 21 3.1 XTI on α and β poses . . . 21

3.2 AEDS on α and β poses . . . 25

3.2.1 Parameter search . . . 26

3.2.2 Production run α-pose . . . 28

4 Conclusions and future perspectives 37 4.1 XTI . . . 37

4.2 AEDS . . . 38

4.3 Future perspectives . . . 38

(8)
(9)

ix

List of Figures

1.1 Catalytic cycle of the CYP450s. The reaction sequence starts at the top, with the CYP450 initial state having a ferric 6-coordinate low spin heme iron axially coordinated by cysteine thiolate (S) and a weakly bound water molecule. The substrate (RH) dis-places the water and binds with the iron, changing its spin from low to high, beginning the redox activity. The subsequent redox stages of the heme iron, two electron transfers and the oxygen binding, culminate in the ferryl-oxo porphyrin radical cation, which is considered to be the dominant oxidant agent of CYP450 reactions. It forms the hydroxylated product (ROH) and then the CYP450 is restored to its resting state by the co-ordination of a water molecule (Belcher et al.,2014). . . 2

1.2 Molecular representation of CYP450 BM3 M11 double mutant with VMD. The heme cofactor is represented in orange and the non-polar residues in white, basic residues in blue, acidic residues in red and polar residues in green. . . 3

2.1 Thermodynamic cycle for the system of study. The cycle con-sists of the enzyme in both mutation states, single and double and the substrate (testosterone) ready for α and β hydroxyla-tion. Here, the vertical ΔG are the free energy changes gener-ated upon ligand conversion within the protein and the hori-zontal ΔG are the binding free energies calculated when mu-tating the protein. The magnitude of interest and the one that will be calculated is ΔΔGbind, the relative binding free energy between both ligand configurations. . . 8

2.2 Schematic overview of the calculation using extended TI of �H(λ)

∂λ

λ. The calculated points (red) are used to generate the

energy curve, and using the predictive nature of the method, the black curve can be predicted. The steeper sections of the curve need many points to be calculated accurately, otherwise leading to non-representative energetic curves (Ruiter and Oost-enbrink,2016). . . 10

(10)

x

2.3 Reference state Hamiltonians (black solid lines) for the EDS approach (top) and the AEDS approach (bottom). The origi-nal harmonic potential energy functions are shown with gray solid and brown dotted lines. The values for Eminand Emaxare displayed as red lines on the bottom picture. The Hamiltonian shown with brown solid lines was shifted using a free-energy offset parameter for the construction of the EDS reference state Hamiltonian. The results show how the AEDS approach is able to make a Hamiltonian that describes the system in a more accurate way (Perthold and Oostenbrink,2018). . . 14

2.4 Representation of the dual topology for the 72ndresidue of the CYP450 BM3 M11 mutant and the chemical structures of ser-ine and isoleucser-ine. In the dual topology picture, the serser-ine amino acid is depicted, with carbon atoms in green, oxygen in red, nitrogen in blue and hydrogens in white. The dummy atoms are depicted in brown, and correspond to the atoms of isoleucine. During the perturbation, the OH terminus of ser-ine will become a dummy appendix while the three existent dummy atoms will change to conform the isoleucine structure. 16

2.5 Left: Starting configuration of the α-pose of BM3 M11. Right: Starting configuration of the β-pose of BM3 M11. The images have been obtained with VMD. The testosterone and the heme cofactor are depicted in bonds, with the α and β hydrogens depicted using CPK spheres in orange and white respectively. 18

3.1 Analysis of the extended TI (black) and normal TI (red) for the CYP450 BM3 S72I mutant on the α-pose for the first nanosec-ond of the run. Only the error bars of the standard TI could be provided since GROMOS does not include that functionality on the XTI. . . 22

3.2 Snapshot of the system during the prolongation run of the XTI simulation at λ = 0.1. Testosterone escapes the active site at low values of λ, where a β configuration is more stable and displays a hydrogen bond with the heme cofactor. This re-sult evidences the use of soft restrains in future simulations to prevent the detachment of the ligand and ensure an adequate sampling.. . . 23

3.3 Distance analysis of the XTI prolongation run for λ = 0.1.

Each frame equals 2 picoseconds, for a total simulation time of 9 nanoseconds of prolongation. During the first moments the structure maintains a stable α-pose but eventually detaches from the active site leading to a decoupled state where testos-terone has left the active site. . . 24

3.4 State coverage graphs of the parameter search simulations for σ = 1, 2, 3 of the β-pose for 20 nanoseconds simulation. The graphs show which state the system is visiting over time and how much time it remains in that state. . . 27

(11)

xi

3.5 Plots of the Emin, Emaxand Eo f f obtained during the 20 nanosec-onds parameter search simulations for the β-pose system. The graphs do not show convergence of the Eo f f towards a specific value but they oscillate without a defined limit. The Emin pa-rameter does not reach a stable value in the cases of σ = 2, 3,

which evidences the need for a better parameter search. . . 30

3.6 Distance analysis of the H16-α (in black) and H16-β (in red) during the parameter search simulation for the three different values of σ. The distance between the given hydrogen and the iron center of the heme cofactor determines if the system main-tains the original β-pose. The gaps in the graphs are the result of corrupted simulation files that were not written correctly. . 31

3.7 Snapshot of the final configuration of the BM3 mutant on the 2σ simulation. The distance between H16-β (orange ball) and the iron on the heme cofactor is displayed as well as the hy-drogen bond between serine 72 and the testosterone ligand. . 32

3.8 Parameter search simulations for the serine to isoleucine mu-tation in water. The decay time has been modified and risen 100 ps every 10 ns. Values for the three required parameters show convergence during the first 40 nanoseconds. It can be seen how Emax remains almost constant in the three simula-tions while Emin goes up with σ and the Eo f f decreases as σ increases. . . 33

3.9 Graphs for Emin, Emaxand Eo f f obtained in the parameter search simulation for the β-pose with increasing decay time at differ-ent values of σ. The structures maintained a β-pose during the simulated time (not shown). Even though the simulations are not finished, the tendency towards a converged value of the energy offset can be seen in all three graphs. This improve-ment in the results for the parameter search can be seen clearly if the graphs are compared with the ones in figure 3.5. . . 34

3.10 State coverage of the production AEDS run for the α-pose. It is clear that during most part of the simulation only one end state is being visited, and thus the sampling acquired is inadequate for the correct analysis of the system’s properties. . . 35

3.11 Distance analysis between α (black) and β (red) hydrogens and the iron of the heme cofactor for the AEDS production run. The fact that the initial configuration is not maintained during the production run shows how the system’s energy landscape has been inappropriately constructed and how additional con-strains on the geometry of the system may be required to sta-bilize certain poses. . . 35

(12)
(13)

xiii

A la memòria del meu avi Antonio. T’estimaré i et

portaré en el record mentre visqui. No t’imagines

(14)
(15)

1

Chapter 1

Introduction

The cytochromes P450 (CYP450) make up a superfamily of enzymes that have heme as a cofactor. The name P450 originates from the IR peak that these compounds exhibit at their absorption maximum. This peak appears when the enzyme is in its reduced form and complexed with carbon monox-ide. CYP450s need an external electron donor to perform their catalytic ac-tivity, which is known as a redox partner. Depending on the redox partner, CYP450s are classified into class I or class II, requiring two or one redox part-ners respectively (Bernhardt,2006).

CYP450s are involved in endogenous processes and also in the transforma-tion of xenobiotics, thus playing an important role in the metabolism of drug-like compounds and their toxicological and pharmacological effects. Most enzymes of this family perform mono-oxygenation reactions on C-H bonds, which results in a hydroxylated product and the production of a water molecule. These hydroxylations reactions are highly regio- and stereo-specific and in most cases contribute to the activation of a C-H bond. Normally, the redox partners of CYP450s are complex and expensive biomolecules, like flavine adenine dinucleotide (FAD), flavine mononucleotide (FNM) or Fe-S ferre-doxin which makes their implementation as industrial biocatalysts difficult. The most promising CYP450 for biocatalysis purposes are the enzymes de-rived from bacterial CYP450 BM3 of Bacillus megaterium, which show the highest activity reported for CYP450s (Warman et al., 2005). In addition to that, CYP450 BM3 is soluble in water and uses a class II redox system, which is fused to the enzyme on the C-terminus of the heme domain, making it a more suitable candidate for biocatalysis than human-derived CYPs, which are integral membrane proteins that depend on external redox partners to perform their functions. The natural substrate of CYP450 BM3 are long chain fatty acids, but upon mutation by means of site-directed mutagenesis or di-rected evolution they display high enzymatic promiscuity (Glieder, Farinas, and Arnold,2002), being able to hydroxylate a wide variety of compounds, which makes them an interesting target to catalyze many industrial reactions. The catalytic activity of CYP450s is located at the heme region of the enzyme, which is highly predominant in the protein structure. For the whole pro-tein of 119 kDa, the heme domain weighs 55 kDa and it is fused to the 65

(16)

2 Chapter 1. Introduction

FIGURE 1.1: Catalytic cycle of the CYP450s. The reaction

se-quence starts at the top, with the CYP450 initial state having a ferric 6-coordinate low spin heme iron axially coordinated by cysteine thiolate (S) and a weakly bound water molecule. The substrate (RH) displaces the water and binds with the iron, changing its spin from low to high, beginning the redox activ-ity. The subsequent redox stages of the heme iron, two electron transfers and the oxygen binding, culminate in the ferryl-oxo porphyrin radical cation, which is considered to be the domi-nant oxidant agent of CYP450 reactions. It forms the hydrox-ylated product (ROH) and then the CYP450 is restored to its resting state by the coordination of a water molecule (Belcher

et al.,2014).

kDa reductase polypeptide mentioned before. The mono-oxygenation of tar-get molecules is achieved by the interaction of a ferryl-oxo porphyrin radical cation intermediate with the substrate. Atmospheric oxygen is taken as reac-tant in the formation of the compound. The depiction of the catalytic cycle in detail can be seen in figure1.1.

1.1 Contextualization

At the MolTOX laboratory in the Vrije Universiteit of Amsterdam, new BM3 mutants were designed by combining experimental and computational ef-forts. These mutants, the M01 and M11, are capable of metabolizing many drug-like compounds, medications and steroids (Capoferri et al.,2016). The structure of BM3 M11 can be seen in figure1.2.

(17)

1.1. Contextualization 3

FIGURE 1.2: Molecular representation of CYP450 BM3 M11

double mutant with VMD. The heme cofactor is represented in orange and the non-polar residues in white, basic residues in

blue, acidic residues in red and polar residues in green.

Previous studies with mutations of these enzymes have shown that both M01 A82W and M11 V87I single mutants are able to hydroxylate testos-terone (TES) at the 16β-position with high stereo-specificity (roughly 100% ee) (Venkataraman et al.,2012). The hydroxylation of testosterone and other steroids by CYP450s is an important step that activates the substrate for fur-ther reactions and has relevance in the synthesis of hydroxylated steroid vari-ants used for therapeutic purposes, since in many cases they require complex synthetic pathways that may lead to many side products (Fernandes et al.,

2003). Both mutants, upon performing a mutation at the active site by chang-ing the serine on residue 72 for isoleucine (S72I), showed an inversion on the stereo-chemistry of the final product, obtaining the 16α-OH testosterone (95% ee).

A physical magnitude of interest when rationalizing a protein-ligand inter-action is the relative binding free energy difference (ΔΔGbinding). This magni-tude gives insight on the recognition process of the protein when binding to biologically relevant ligands or inhibitors, also assessing the stability of the

(18)

4 Chapter 1. Introduction interaction. The difference between absolute and relative binding free ener-gies lies in the transformation required to calculate the magnitude; absolute binding free energies are difficult to obtain since they require the formation of the ligand from nothing to be calculated while relative binding free en-ergies are obtained by transforming ligand A into ligand B, which can be done by means of alchemical transformations and it is computationally more viable. The formalism where relative binding free calculations are derived from is exact from a statistical mechanics stand point and becomes the op-tion of choice over absolute binding free energies when trying to raop-tionalize protein-ligand interactions in computational chemistry due to its robustness and exact background, with the main downside of requiring long simulation times.

The focus of this work will be the retrospective prediction of the inversion of stereochemistry in testosterone upon mutation of the active site. To this end, the change in free energy upon mutating the enzyme with the substrate in one of the two studied conformations will be computationally calculated. Once the free energy differences are acquired, the relative binding free energy of the final compounds (α or β 16-OH testosterone) in both mutants will be calculated and it will be determined if the outcome of the reaction could have been predicted a priori.

To perform the calculation of the binding free energy difference between the systems, extended thermodynamic integration (XTI) (Ruiter and Oost-enbrink,2016) will be used, which is a more computationally efficient coun-terpart of thermodynamic integration (TI), while also obtaining the relative binding free energy with traditional TI as a mean to test the XTI results. As an additional way to calculate binding free energies, the enveloping distribu-tion sampling method (EDS) will be used, more specifically a new approach called accelerated EDS (AEDS) that will be described in future sections. The results obtained with both methodologies will be compared in an attempt to assess the quality of the analysis.

1.2 Objectives

1. Perform the necessary calculations to obtain the relative binding free energy difference for both α and β hydroxylation products of testos-terone with both M11 mutants.

2. Test the viability and performance of the XTI and AEDS while provid-ing ΔΔG values with both methods to compare the obtained values in order to assess the quality of the analysis.

3. Perform a thorough analysis of chemical factors (bond distances, an-gles, H-bond formation, etc.) to rationalize the hydroxylation of testos-terone.

(19)

1.2. Objectives 5 4. Assess whether the inversion of stereochemistry upon single point

(20)
(21)

7

Chapter 2

Methodology

2.1 Theoretical background

Calculating the binding free energy of a protein-ligand complex is generally a very computationally demanding task. More conventional approaches, such as thermodynamic integration (TI), alchemical free energy perturba-tion (FEP) or the Bennett’s Acceptance Ratio (BAR), are rigorous alchemical methods used to calculate relative binding free energies of compounds with the drawback of having long simulation times. Lately, approaches that aim at improving accuracy and efficiency of free energy calculations have been developed (Kim and Allen, 2012, Steinbrecher, Joung, and Case,2011), but alchemical methodologies still require nowadays long simulation times to obtain results.

Alchemical free energy transformations are an accurate methodology to cal-culate free energy differences between protein-ligand complexes and other biochemical systems (Chodera et al.,2011). The first step is the construction of a thermodynamic cycle that involves the initial and the final state of a tar-get system. Then, the system is brought from the initial state A to the final state B via the progressive non-physical modifications in many intermediates of the involved species until the final state is reached.

These alchemical transformations are non-physical in the sense that certain atoms of the system are mutated into another set of atoms with different characteristics by means of internal parametrization. The mutations have to be done in a progressive and slow way to ensure that there is enough over-lapping between the phase space of the initial system and the final one, since in alchemical free energy methods adequate sampling is crucial to obtain re-liable results. Some free energy problems cannot be dealt with by means of alchemical transformations i.e. when the perturbations are too big and there is insufficient overlapping between initial and end state. In these cases, changes in the system need to be applied in many steps to solve the lack of sufficient overlap between perturbed systems or a different methodology needs to be used.

The theoretical background for alchemical free energy calculations was mainly developed by Kirkwood (Kirkwood, 1935) and Zwanzig (Zwanzig, 1954),

(22)

8 Chapter 2. Methodology

FIGURE2.1: Thermodynamic cycle for the system of study. The

cycle consists of the enzyme in both mutation states, single and double and the substrate (testosterone) ready for α and β hy-droxylation. Here, the vertical ΔG are the free energy changes generated upon ligand conversion within the protein and the horizontal ΔG are the binding free energies calculated when mutating the protein. The magnitude of interest and the one that will be calculated is ΔΔGbind, the relative binding free

en-ergy between both ligand configurations.

among others. Kirkwood introduced the idea of the coupling parameter λ and laid the basis of thermodynamic integration. Zwanzig stated that a free energy difference between two states can be computed using the exponential average of energy differences over an ensemble of configurations, laying the base for free energy perturbation methods.

The thermodynamic cycle used in this work can be seen in detail in figure

2.1. The magnitude of interest, ΔΔGbind, can be expressed in terms of the

binding free energy differences that result upon the mutation of the initial single mutant into the double mutant, which can be obtained by means of computer simulations. By calculating the binding free energies differences between the two mutants, one can indirectly probe the system for the differ-ence of binding free energy between the two different α and β configurations of the ligand.

2.1.1 Extended thermodynamic integration

Extended thermodynamic integration (XTI) is a revised and optimized ver-sion of the classical method of thermodynamic integration that includes the prediction of non simulated λ-points to speed up the computational process (Ruiter and Oostenbrink,2016). This method has been reported to have the same accuracy as the standard TI, obtaining similar results as its counterpart with less simulated λ-points. To obtain the energy landscape on which nu-merical integration will be performed, a choice of λ points needs to be made. The values for TI will be calculated at these values of λ and they need to be

(23)

2.1. Theoretical background 9 chosen appropriately to ensure that the resulting landscape is representative of the system: an equidistant and sufficient number of λ points between 0 and 1, where 0 is the non-perturbed system and 1 is the system completely perturbed.

The basis for thermodynamic integration is given by statistical thermody-namics, which states that the expected value of the derivative of the Hamil-tonian of a system with respect to the perturbation parameter λ, gives the binding free energy when integrated for all the values of λ.

ΔGTIAB = � 1 0 � H(λ) ∂λ (2.1)

The integral on equation2.1has to be solved numerically, and thus regions of the space that exhibit high curvature will require a higher density of λ-points around them for the result to be accurate. The main advantage of XTI is that with the prediction of non simulated λ-points, this process is optimized to give accurate results and prevent oversampling of the space. The approach that will be used in this work is implemented in the GROMOS simulation software (Christen et al.,2005), and uses a soft-core potential for the atomic interactions to prevent singularity problems when creating or annihilating particles of the system (Beutler et al., 1994). This soft-core potentials keep pairwise interaction energies finite for all configurations of the system and provide smooth free energy curves, also influencing the final result of the calculation and its efficiency.

An example of the energy curve obtained by deriving the Hamiltonian re-spect to the λ parameter can be seen in figure 2.2. The simulation of the system at a λ point is computationally expensive, but the main advantage of the XTI algorithm is that at each λ-point it computes the necessary cou-pling parameters µx that allow the prediction of �H∂λ(λ)�λ at non-simulated

λ-points.

The coupling parameters µxare defined in GROMOS as a fourth degree

poly-nomial for every interaction type. They scale the interactions between atoms and their correspondent softness. The coefficients of the polynomial are cho-sen such that µx is zero for λ = 0 and µx =1 for λ= 1. At every simulated

λ-point (λs), the algorithm calculates an array of µix ={0, 0.01, 0.02, ..., 1}that

allows the prediction of the state of the system at non simulated (predicted)

λ-points (λp). The prediction also involves the scaling and re-weighting of

some energetic contributions, since the predictions that are made for λp are

calculated from a configurational ensemble at λs. The conclusive result can

be seen in equation2.2: �H(r, p, λ p) ∂λλp(λs) = � H(r,p,λp) ∂λ � � � λs e −β[H(r,t,λp)−H(r,t,λs)] � λs �e−β[H(r,t,λp)−H(r,t,λs)] λs (2.2)

(24)

10 Chapter 2. Methodology

FIGURE 2.2: Schematic overview of the calculation using

ex-tended TI of �∂H(λ)

∂λ

λ. The calculated points (red) are used to generate the energy curve, and using the predictive nature of the method, the black curve can be predicted. The steeper sections of the curve need many points to be calculated accu-rately, otherwise leading to non-representative energetic curves

(Ruiter and Oostenbrink,2016).

With β being(kbT)−1 and kbthe Boltzmann constant. With this system, one

can predict�H(λ)

∂λ

for as many λ

pas wanted, ranging from 0 to 1, but taking

into consideration that equation 2.2 is only reliable when there is enough overlapping between the phase space of subsequent systems i.e. when the difference between λsand λp is not too large.

This method excels when combining predictions to refine the choice of sim-ulated points. In principle, it is possible to predict �H(λ)

∂λ

for all λ values ranging from 0 to 1, but this prediction will be less reliable the further away

λsand λp are. This introduces a solution in the form of an additional

simu-lated λ-point, such as λs1 <λp <λs2, that will be used to determine, the next

λs. The idea of this approach is to introduce the next λswhere neighbouring

predictions disagree most, using the weighted differences: �H(r, t, λ p) ∂λλp =ws1 �H(r, t, λ p) ∂λλp(λs1) +ws2 �H(r, t, λ p) ∂λλp(λs2) (2.3) With ws1 = λp−λs2 λs1λs2 and ws2 = λp−λs1 λs2λs1.

The scheme to follow is, for every λp, calculate the absolute difference

be-tween the predictions from λs1and λs2, weighting them by w= (λs2−λs1)−

(25)

2.1. Theoretical background 11 close to one of the simulated states, find the absolute weighted maximum, simulate it as the next λsand repeat as an iterative process.

The implementation of this approach as part of the standard workflow of this project was rejected due to its complexity. Instead, as suggested on the original XTI paper, 21 equidistant λ-points between 0 and 1 were simulated for the target protein-ligand complex, taking advantage of the quicker imple-mentation of the XTI algorithm and the smoother resulting energy curve.

2.1.2 Accelerated enveloping distribution sampling

The fundamental difference between thermodynamic integration and envelop-ing distribution samplenvelop-ing (EDS) is the description of the Hamiltionian of the system. While in TI the evolution of the system from the starting state A to the end state B was described using many intermediate states determined by the parameter λ, in EDS only one Hamiltonian is used.

To guarantee that the Hamiltionian of choice accurately represents both states

A and B, a reference Hamiltonian R which phase space envelops the relevant

phase space of states A and B is used (Christ and Gunsteren,2007). If the ref-erence state is adequately constructed and it includes many relevant regions of the phase space of the states Ai(i = 1, 2, ...), several free energy

calcula-tions can be obtained from a single simulation.

There are two main difficulties that this approach needs to overcome when constructing the reference state R. First, the sub-states Ai with higher free

energy will be sampled less often than the ones with lower free energy, lead-ing to poor statistical convergence and incorrect sampllead-ing of the phase space. To solve this, the reference state can be iteratively updated to ensure proper sampling of all the states Ai. The second problem has to do with the

over-lapping of the individual phase spaces of the sub-states Ai. If these states

lie far apart in the phase space and overlapping is not sufficient, the system will get stuck in the initial state Aiand will not explore the rest of the states.

This issue can be considered a limitation of the method and it is advised to only use EDS when the phase space overlapping condition is met to ensure optimal results. Alternatively, this difficulty can be overcome by using meth-ods that help sampling states that lay apart by high energy barriers such as Hamiltonian replica-exchange molecular dynamics (HREMD), at the cost of more computation time.

To compute the free energy difference of two states, the following formula can be used:

ΔFBA =FB−FA =−β−1ln� QQB A

(2.4) Where β is (kbT)−1 and QA,B are the partition functions of the states A and

(26)

12 Chapter 2. Methodology states has to be constructed. In the formalism of the EDS, this partition func-tion is the sum of the partifunc-tion funcfunc-tions of state A and B:

QR(N, V, T) =QA(N, V, T) +QB(N, V, T) = 1 h3NN! � � dpdr � e−βHA(p,r)+eβHB(p,r) � (2.5)

Where h is Planck’s constant, N is the number of particles, V and T the vol-ume and temperature to indicate the ensemble, HA,Bthe Hamiltonians of the

states A and B, and r and p the 3N-dimensional position and momentum vectors used to describe the system. Using this partition function, we obtain the reference Hamiltionian H(R):

HR(p, r) =−β−1ln�e−βHA(p,r)+e−βHB(p,r)� (2.6)

When simulating HR, one obtains the sampling of the phase space of HAand

HB. Then, the free energy difference can be calculated as the ratio QB/QA

from the relative times R spends in B and A. The more correct estimation of the magnitude can be obtained if the average value is taken:

ΔFBA =ΔFBR−ΔFAR =−β−1ln � �e−β(HB−HR)R �e−β(HA−HR)R � (2.7) Before starting the EDS simulation of the system of study on GROMOS, a topology for the state R that combines both states A and B and a perturbation topology will be written.

In practice, when performing an EDS simulation, a parameter search simu-lation needs to be run before the actual production. This simusimu-lation calcu-lates two parameters that are required to ensure an optimal run and that will be used and updated throughout the simulation: the smoothness parameter and the energy offset parameter. It has been shown that these automatically determined parameters lead to very efficient sampling of the relevant con-formations of states Ai(Christ and Van Gunsteren,2009).

The smoothness parameter (0 ≤ s ≤ 1) smoothens the transitions between the possible states Ai, and the energy offset parameter (ERi ) weights and

brings the energies of the states Ai to the same level to ensure proper

sam-pling (Hansen et al.,2012).

In this work, the method recently developed by Oostenbrink et al. will be used to perform the EDS analysis. This novel approach to EDS is called ac-celerated EDS (AEDS) and allows the construction of a reference state Hamil-tonian that preserves local energy minima of the combined end states. It in-cludes a robust and efficient parameter optimization scheme that is aimed

(27)

2.1. Theoretical background 13 to increase the efficiency of the method and the user-friendliness of the ap-proach, minimizing the amount of parameters that the user has to manu-ally tune before obtaining the results of the simulation (Perthold and Oost-enbrink,2018).

To ensure total accessibility of the end states and the preservation of the en-ergy minima, the reference Hamiltonian described in2.6is smoothed using a harmonic potential energy function and the energy landscape is pulled down to a minimum energy parameter Emin. Then, the modification of the reference

state Hamiltonian is restricted to a region between Eminand a maximum

en-ergy parameter Emax in order to preserve the energy landscape around the

local energy minima in better detail. The parameter search simulation pre-vious to the production run on an AEDS simulation calculates the optimal Emin and Emax of the system. In addition, the energy offset parameter has

now been adapted to be the free-energy offset parameter ΔFR, which is the

equivalent of the energy offset parameter in the new formalism of AEDS, which will be explained later on. The continuous AEDS reference Hamilto-nian H∗ R(�r)is defined as: H∗ R(�r) =     

HR(�r)−Emax−2Emin HR(�r)≥Emax

HR(�r)−2(Emax1Emin) Emin <HR(�r) <Emax

HR(�r) HR(�r)≤Emin

(2.8) A comparison between reference state Hamiltonians built using the EDS and the AEDS approach can be seen in figure2.3. During this work, the simula-tions using the AEDS approach were performed in parallel with the XTI ones in order to obtain two sets of binding free energy values for the α and β-pose for later comparison.

Prior to the production run, a set of AEDS parameters that define the energy landscape are obtained iteratively via non-equilibrium simulations called pa-rameter searches. During these simulations the system’s Hamiltonian is ex-plored freely, and the initial and end state are visited and their relative pa-rameters are optimized on the fly. These papa-rameters, the energy minima Emin,

energy maxima Emaxand energy offset Eo f fR or ΔFRare of vital importance to

the production AEDS run since they shape the energy landscape of the sys-tem’s Hamiltonian and their accuracy is directly related to the output of the simulation.

When an end state is visited, its average energy Hi(r)−ERi , the standard

de-viation of that energy σHi(r), and the average of the maximum transition

en-ergy between states E‡max are calculated. Then, the parameter search

magni-tudes Emaxis calculated as the upper border of the energy difference between

any end states:

(28)

14 Chapter 2. Methodology

FIGURE2.3: Reference state Hamiltonians (black solid lines) for

the EDS approach (top) and the AEDS approach (bottom). The original harmonic potential energy functions are shown with gray solid and brown dotted lines. The values for Eminand Emax

are displayed as red lines on the bottom picture. The Hamilto-nian shown with brown solid lines was shifted using a free-energy offset parameter for the construction of the EDS refer-ence state Hamiltonian. The results show how the AEDS ap-proach is able to make a Hamiltonian that describes the system

in a more accurate way (Perthold and Oostenbrink,2018).

Emax =Emax‡ (2.10)

And the Eminis calculated such that the maximum potential energy barrier in

H∗

R(�r)is reduced to a chosen value ΔE∗max, which is a multiple of the standard

deviation σHi(r) with the lowest average energy min

(29)

2.2. Simulation work-frame 15

Emin =

  

2�min�Hi(r)−EiR�+ΔE∗max�−E‡max 2E‡maxΔE∗max+2Emax‡ min�Hi(r)−EiR

−(Emax‡ )2−min�Hi(r)−ERi

�2

2ΔEmax

(2.11)

Depending if Emin ≥min�Hi(r)−ERi �or Emin <min�Hi(r)−EiR�. The

en-ergy offset of the first state is set to zero and the subsequent offset parameters of the end states visited are calculated by free energy perturbation:

ERi�=1 =kBT ln �� e−[H∗i�=1(r)−H∗R(r)]/kBT � R � +kBT ln �� e−[H∗ 1(r)−H∗R(r)]/kBT � R � (2.12) An additional parameter called memory decay time τΔV is included in the energy offset to allow a faster adjustment. This decay time acts as a damping factor that is linearly interpolated between two different values at the begin-ning and the end of the simulation run that are chosen by the user, modifying the behavior of the convergence of the Eo f f:

� e−[H∗ i(r)−HR∗(r)]/kBT � t = � 1−e−Δt/τΔVe−[H∗ 1(r)−HR∗(r)]/kBT(t) +e−Δt/τΔV�e−[H1∗(r)−HR∗(r)]/kBT� t−Δt (2.13)

Where t is the simulation time and Δt the chosen time step in the simulation. This allows to tune convergence in systems where the Eo f f fluctuations are large. The interpolation of the decay time follows the next formula:

τΔV =τAΔV+�τBΔV−τAΔV� t

ttot (2.14)

Where ttotis the total simulation time.

2.2 Simulation work-frame

All the calculations in this work were performed using the simulation soft-ware GROMOS (Christen et al., 2005) and the visualization and post pro-cessing of trajectories were done with VMD (Humphrey, Dalke, and Schul-ten,1996), Python scripting and additional built-in GROMOS programs. The calculations for both main analysis were performed using the Dutch super-computer Cartesius. All the free energy analysis methodologies that are em-ployed in this work are embedded in the GROMOS software package for molecular simulation.

(30)

16 Chapter 2. Methodology When introducing a perturbation on the system such as amino acid muta-tions, two main methods regarding the modification of molecular topology exist: the single and dual topology approaches. In the single topology ap-proach, the force field parameters describing the different states of the sys-tem during the perturbation are linearly interpolated, such as the charge, bond angle, dihedrals, etc. In the dual topology approach, however, the po-tential energy of the states is interpolated and not the force field parameters. This dual topology has integrated the λ-dependant mutation in the form of additional dummy atoms that do not interact with each other and have no force field parameters of interaction energies. As the system changes, the parameter λ defines the mixing between the two coexistent topologies. This last approach is especially useful when perturbations require the creation or destruction of atoms. In the case of study, dual topologies have been used for the mutated amino acids on the XTI and AEDS calculations, as can be seen in figure2.4.

(A) Dual topology of S72. (B) Serine. (C) Isoleucine.

FIGURE 2.4: Representation of the dual topology for the 72nd

residue of the CYP450 BM3 M11 mutant and the chemical struc-tures of serine and isoleucine. In the dual topology picture, the serine amino acid is depicted, with carbon atoms in green, oxygen in red, nitrogen in blue and hydrogens in white. The dummy atoms are depicted in brown, and correspond to the atoms of isoleucine. During the perturbation, the OH termi-nus of serine will become a dummy appendix while the three existent dummy atoms will change to conform the isoleucine

structure.

The structures that are used in this work were previously obtained by R.A. Luirink and coworkers at the MOLTOX group. In order to obtain the ini-tial structures, the BM3 M11 pdb structure was used as a template. Missing residues from the crystal structure were added a posteriori as well as the muta-tion V87I using Modeller 9.3 (Šali and Blundell,1993) and then the structure was allowed to reorganize to attain a more relaxed conformation to avoid steric clashes. Testosterone structure undergone energy minimization using

(31)

2.2. Simulation work-frame 17 the MMFF94 (Halgren, 1996) force field with OpenBabel 2.3.9 (O’Boyle et al., 2011) and then partial charges were obtained with GAMESS (Schmidt et al., 1993) at Hartree-Fock level of theory using the 6-31G* basis set and then the structure was minimized at B3LYP/COSMO/6-31G* level of the-ory. The minimized testosterone was then used in docking simulations using the software PLANTS 1.2 (Korb, Stützle, and Exner,2007) and the PLANTS ChemPLP scoring function. The resulting docking poses were filtered so the angle between the iron on the heme and the OH in testosterone had to be between 90 and 150 degrees, while the distance between the α and β hydro-gens and the ferryl oxygen had to be below 0.4 nanometers. The remaining poses were visually inspected and 3 docking structures were selected. Par-tial charges were generated with the above mentioned protocol and the re-maining interaction parameters for testosterone were obtained from the Gen-eral Amber Force Field (Wang et al., 2004) using the Antechamber version 1.27 (Wang et al., 2006). The protein was described using the parameters in Amber99SB-ILDN force field, and for the heme cofactor the parameters were obtained from the following reference (Shahrokh et al.,2012). The GRO-MACS 5.1.3-gcc version (Abraham et al.,2015) was used to perform molecu-lar dynamics on the system. Thirteen sodium atoms were added to neutral-ize the system and around 14000-15000 water molecules were used to solvate it. One structure gave a stable β-pose and the other two flipped over the α and β conformations. The stable β-pose undergone a GROMOS simulation for 50 nanoseconds and remained stable. The dummy atoms for the later perturbation S72I were included using the built-in GROMOS program gca. The α-pose was later obtained during an AEDS parameter search simulation that started with a β-pose that then turned into a stable α conformation. The pose was then subject to a minimization in vacuum, thermalization and sol-vation and an additional MD simulation for 10 nanoseconds was performed in GROMOS to check for its stability.

The starting structures used in this work displayed a clear orientation of the α or β hydrogens of testosterone towards the reactive iron of the heme domain. These orientations are ensured to lead to the respective α or β hydroxylated products, these results being backed up by previous docking and MD calcu-lations for other substrates of CYP450 BM3 (Olah, Mulholland, and Harvey,

2011, Lonsdale et al.,2013, Leth et al.,2019). The resulting two starting con-figurations can be seen in figure2.5.

2.2.1 XTI methodology

To perform XTI on the CYP450 BM3 mutant, an existing pdb structure of the mutant with the testosterone bound to it and solvated in an α-pose obtained previously as explained above was used.

For the XTI simulations, the necessary input blocks to create the perturba-tions were obtained by direct communication with C. Oostenbrink, and the

(32)

18 Chapter 2. Methodology

FIGURE 2.5: Left: Starting configuration of the α-pose of BM3

M11. Right: Starting configuration of the β-pose of BM3 M11. The images have been obtained with VMD. The testosterone and the heme cofactor are depicted in bonds, with the α and βhydrogens depicted using CPK spheres in orange and white

respectively.

standard perturbation parameters were designed using the standard GRO-MOS protocol described in the user manual: the mutated serine atoms were modified to dummies and the previous dummies were changed into stan-dard isoleucine atoms with parameters extracted from the GROMOS force field. The initial configurations were minimized in vacuum using GROMOS to eliminate possible steric clashes and prevent SHAKE errors due to close atoms on future simulations. After a successful minimization, the system was equilibrated to 298K with various simulations that included a gradual in-crement of the system’s temperature. To maintain the values of temperature and pressure and ensure a correct sampling, a Nosé-Hoover chain thermostat was used and the pressure was coupled to a pressure bath. The rotation and translation of the center of mass was removed every 1000 simulation steps. The SHAKE algorithm was applied to the solvent and solute bonds as the atomic constraint of choice. Then a MD run was set to test the stability of the system. The run was prolonged for 10 nanoseconds, to confirm the preserva-tion during the run of the α and β poses. The final stable structure resulting from the MD runs was used to set up the XTI simulation. The alchemical cy-cle used for both poses to calculate the relative binding free energy was the mutation of the starting single mutant V87I to the double mutant V87I S72I, with the testosterone docked into the active site. The perturbation consisted of the mutation of serine 72 to isoleucine, described in figure 2.4. Regard-ing the perturbation parameter, 21 λ points equally spaced from λ = 0 to

(33)

2.2. Simulation work-frame 19

λ = 1 were simulated to elaborate the energy profile of the system,

includ-ing the precalculation of energies required by XTI as described previously on the methodology section. The runs started with 1 nanosecond produc-tion followed by 9 nanoseconds of prolongaproduc-tion to ensure that the configura-tion was stable under the given condiconfigura-tions. The output files obtained in the simulations were analyzed using standard XTI GROMOS scripts and then the �H

∂λλ values for each lambda point were directly obtained along with

the free energy difference of the system by running specific XTI GROMOS scripts. Along with the XTI values, the standard TI points were also extracted and the correspondent energy landscape was made in order to compare the results obtained with both methods.

2.2.2 AEDS methodology

The AEDS block required to perform the calculations and the additional sup-plementary material that included detailed information on the AEDS was obtained by direct communication with J. Perthold, since the syntax on the GROMOS files was different than the one employed in the XTI perturbation bloc. The same perturbation was applied in this case and the same simula-tion values detailed for the XTI were applied to the AEDS calculasimula-tions. The CUDA libraries were used as the acceleration method to speed up the cal-culations during the AEDS simulations, being applied to solve calcal-culations involving solvent molecule interactions. The post processing and posterior analysis were performed using additional GROMOS programs specifically designed to extract specific values from the output files, perform statisti-cal analysis or generate coordinate files in different formats among many other functionalities (Program Documentation GROMOS). Additional in-house scripts to obtain AEDS data were obtained by direct communication with C. Oostenbrink. The parameter search simulation for the β-pose was conducted after performing an initial minimization of the original structure. Additional MD runs were performed for 10 nanoseconds to check the stability of the structure. The last structure obtained from the MD runs was used as the initial structure to perform the parameter search.

Three sets of simulations were performed for the β-pose, with σ values 1, 2 and 3 to test the influence of the σ parameter on the structures. An initial free search was conducted to find Emin Emax and Eo f f. In order to affect the

convergence of the offset, 3 new simulations were launched with the same σ parameters under free optimization but with different decay times to test its influence on the convergence process. For all the existing σ values ranging from 1 to 3 an additional simulation was performed with a progressive decay time increase of 100 picoseconds every 10 nanoseconds. The decay time was modified using external scripts since the current version of GROMOS does not support the modification of the decay time by default.

(34)
(35)

21

Chapter 3

Results and discussion

3.1 XTI on α and β poses

Due to time limitations, only the first nanosecond of XTI simulation data is displayed, since the prolongation runs for some λ points were still running while this project had to be handed in. In addition, an error in the perturba-tion topology was found later during the XTI run, where one carbon of the serine amino acid (CB 678) was being perturbed to a wrong GROMOS atom type. This mistake was noted when the initial simulations were already run-ning and in order to correct it a new simulation for the α-pose with a cor-rect topology was launched, but the results will not be displayed since they are still running. The results obtained after the full XTI analysis on the first nanosecond of the α-pose are displayed in table3.1. The XTI implementation in GROMOS provides the user with TI and XTI data after the simulation is done, which can be used to compare these two methods. To calculate the relative binding free energy of the system, the values of ΔGbindingfor the XTI

and the TI of the β-pose are still required and could not be provided due to lack of time.

α-pose XTI (kJ/mol) α-pose TI (kJ/mol)

49.60 50.85

TABLE3.1: Relative binding free energy values for the XTI and

TI on the α-pose. Standard deviations could not be obtained from the simulations since the program does not support them

by default.

The values for the free energy difference obtained from the XTI simulation are comparable, since both TI and XTI display a similar energy landscape albeit the landscape provided by XTI shows a smoother surface. The trajec-tories that resulted from the prolongation runs were analyzed and, as can be seen in figure3.2, detachments from the active site were found for low values of λ. It is thought that this behavior could be prevented with the use of soft restrains on the distances between the ligand and the heme cofactor, but fur-ther research is needed to corroborate it. The detachment from the active site also carried the formation of a hydrogen bond between O2A of the heme co-factor and H4A of testosterone, as can be seen in figure3.3where the distance

(36)

22 Chapter 3. Results and discussion

FIGURE3.1: Analysis of the extended TI (black) and normal TI

(red) for the CYP450 BM3 S72I mutant on the α-pose for the first nanosecond of the run. Only the error bars of the standard TI could be provided since GROMOS does not include that

(37)

3.1. XTI on α and β poses 23

FIGURE 3.2: Snapshot of the system during the prolongation

run of the XTI simulation at λ = 0.1. Testosterone escapes the

active site at low values of λ, where a β configuration is more stable and displays a hydrogen bond with the heme cofactor. This result evidences the use of soft restrains in future simu-lations to prevent the detachment of the ligand and ensure an

adequate sampling.

between the alpha hydrogen is plotted against the formation of this new hy-drogen bond that stabilizes the detached conformation of testosterone. To maintain coherence with what was done with the α-pose, before starting the XTI on the β-pose it was decided that the starting configuration should come as well from a parameter search simulation. Unfortunately, during the course of this work, suitable β configurations were not obtained after an AEDS parameter search in time to have a full XTI run completed, since convergence issues with energy parameters arose.

(38)

24 Chapter 3. Results and discussion

FIGURE3.3: Distance analysis of the XTI prolongation run for

λ = 0.1. Each frame equals 2 picoseconds, for a total

simu-lation time of 9 nanoseconds of prolongation. During the first moments the structure maintains a stable α-pose but eventually detaches from the active site leading to a decoupled state where

testosterone has left the active site.

Extended TI has proved to be a solid option instead of standard TI, consider-ing that both are comparable in terms of computational efficiency while used with GROMOS and XTI is able to extract more information from the given system while also providing a smoother energy landscape that is capable of giving more accurate results in cases where the calculated λ points lead to a non smooth and more approximate energy landscape. In the future, new XTI simulations with less simulated λ points could be set up to test the trade-off between simulation time and quality of the analysis. Using XTI with less λ points would mean a clear decrease in the simulation time while the loss in accuracy can be compensated by the precalculation of λ values inherent to the XTI methodology. This is a reasonable approach since the comparison between TI and XTI data shows that accuracy marginally improves by using XTI over traditional TI at this number of λ points.

The stability of the ligand within the active site of the enzyme was tested during the XTI simulations, finding that in the XTI S72I α-pose run, the lig-and detached from the active site after several nanoseconds for low values of λ, where there amino acid 72 has a predominant serine character. This re-sult was in agreement with the previous work done, where it is detailed that the single mutant V87I hydroxylates testosterone in β position. Hydrogen bonds between testosterone and serine 72 previously reported in literature

(39)

3.2. AEDS on α and β poses 25 were found during XTI and AEDS runs but were not found in the simula-tions where the detachments took place. This result evidenced the need for soft restrains on the molecule to maintain the α-pose during the XTI simu-lation. Regardless, the results obtained during the first nanoseconds of the simulations are relevant since the ligand held an α-pose. The system reported SHAKE errors during simulations at low λ values, where the α-pose is less stable than the β-pose. This was due to extreme velocities of certain atoms that resulted from they coming too close together when reorganizing their positions to acomodate the effect of the perturbation, which could be solved by decreasing the time step and allowing for the relaxation of atoms. Another problematic of this approach was the choice of the perturbation steps, since the perturbation had to be gradual enough to ensure that the successive sys-tem phase spaces overlapped but, at the same time increasing the simulated

λpoints also increased the simulation time. It was also relevant to ensure the

appropriate sampling of the phase space for each value of λ by performing long enough simulations.

Extended thermodynamic integration is a step forward towards the opti-mization of free energy calculations involving the standard TI methodology combined with a new approach that is able to reduce the simulation time. The use of progressive mutations defined on the λ parameter allows for a gradual change of the system while the precalculation of the λpsmooths the

resulting energy landscape providing a more accurate profile for the numer-ical integration. The main advantage of using XTI over AEDS was the sim-plicity and yet robustness of the method, with the only drawback of the long simulation times.

3.2 AEDS on α and β poses

The accelerated enveloping distribution sampling developed by Oostenbrink and coworkers (Perthold and Oostenbrink, 2018) has been recently devel-oped and it has been tested on systems that are chemically simpler than the enzyme-ligand system that is being studied in this work.

The two sets of simulations for the α and β poses had two differentiated parts: the parameter search and the production run. For the β-pose, the pa-rameter search simulation had to be performed. The initial structure was checked to ensure that it was a β-pose and then was submitted to minimiza-tion and equilibraminimiza-tion simulaminimiza-tions as detailed in the GROMOS protocol to re-lax the structure and bring it to 298K. The perturbation topology was adapted from the XTI simulations, modifying the syntax and making it suitable to the AEDS requirements.

The original objective of this work was to analyze the inversion of stereos-electivity in the hydroxylation of carbon 16 in testosterone by the BM3 mu-tants upon mutation of the 72nd residue of the active site. After encountering the first problems during the AEDS parameter search and the detachments of

(40)

26 Chapter 3. Results and discussion testosterone from the active site on the XTI prolongation runs, this objective was postponed until a feasible protocol to set up reliable AEDS simulations and a way to keep the poses during XTI was found.

3.2.1 Parameter search

There were three independent parameter search simulations for the β-pose, each of them having a different maximum energy barrier parameter (σ), since these values are calculated as multiples of the standard deviation of the en-ergy of the system. These three independent runs were performed to check if the conformation that was being studied in each case was stable under the parameters of choice and how it changed with the election of σ, which has a notable impact on the shaping of the energy landscape of the reference state. Three sets of simulations were performed for the β-pose, with σ values rang-ing from 1 to 3, allowrang-ing for free optimization of Emin, Emaxand Eo f f. The

pa-rameter search simulations were conducted for 20 nanoseconds for the three systems. The energy offset convergence and the distances between the α and

βhydrogens and the iron atom of the heme group are displayed in figures

3.4,3.5and3.6and the state coverage of the systems with the obtained energy parameters in table3.2: β-pose Transitions 66 58 83 Visit 1st state 34 30 42 Visit 2ndstate 33 29 42 Emax(kJ/mol) -61.92 -63.23 -61.79 Emin(kJ/mol) -256.45 -338.18 -179.80

TABLE 3.2: State coverage analysis of the parameter search of

the β-pose, conducted at three different values of σ. The transi-tions between states conform the sampling of the energy land-scape and allow for the optimization of the energy parameters that will be used in the production run. The values for the Emax

and Eminparameters are also displayed as reported in the last

simulation step.

All three systems displayed a similar behavior, showing transitions between states, maintaining a β conformation during the simulation and having a non converging Eo f f parameter, with a comparable value for the Emaxparameter.

Out of the three simulations, the final structures were obtained and analyzed. Only the 2σ stucture still displayed a hydrogen bond between the protein backbone on the serine 72 residue and testosterone while still maintaining a

β-pose, which can be seen in figure3.7. This system configuration was the

chosen one to carry on the production AEDS simulation. In parallel, the 2σ parameter search was prolonged for an additional 20 nanoseconds with fixed Emin Emax to seek for an optimized vale of the offset parameter. Compared

(41)

3.2. AEDS on α and β poses 27

FIGURE3.4: State coverage graphs of the parameter search

sim-ulations for σ = 1, 2, 3 of the β-pose for 20 nanoseconds

simu-lation. The graphs show which state the system is visiting over time and how much time it remains in that state.

is found that in this system convergence is not achieved during the first pa-rameter search simulation. During the prolongation run with fixed parame-ters, the energy offset was still fluctuating without reaching convergence (not shown).

Given that the AEDS methodology had not been applied to calculate relative binding free energies on protein-ligand systems and that the existing results in literature displayed a different behavior on the method performance re-garding offsets convergence, a new set of simulations was prepared to mod-ify the decay time as explained in the methodology section. A system con-taining a serine residue with non polar caps was mutated into isoleucine in water using the AEDS parameter search scheme. The idea behind this method is that the acquiring of the energy parameters for the mutated residue is faster and the values converge better when the amino acid is in water

(42)

28 Chapter 3. Results and discussion than when it forms part of the protein, mainly because of the lack of steric hindrances. Once these initial values are obtained, a new set of parameter simulations with the protein-ligand system can be launched, starting free or restricted parameter search runs based on the values found in the wa-ter simulations. This methodology was suggested by C. Oostenbrink as a way to find quicker converged values during parameter search, and it has previously been applied successfully on free ligands that would be mutated during AEDS, but not on protein amino acids. The first simulations already reported converged values for Emin, Emax and the Eo f f, as can be seen in

fig-ure3.8. Rising the decay time acts as a damping mechanism that allows a quicker and more steady convergence of the simulation parameters, avoid-ing big fluctuations.

After the simulations in water displayed a converging tendency of the energy parameters, a new set of parameter search simulations for the β-pose was prepared. The decay time was risen 100 picoseconds every 10 nanoseconds, as happened in the free solvated ligand simulations. The results can be seen in figure 3.9, and despite not being finished they already show a change in the behavior of the energy parameters towards convergence to a finite value. These simulations need to be prolonged until 20 nanoseconds minimum and then they can be refined if the following parameter search simulations are conducted fixing the values of Emin and Emax and only optimize the energy

offset parameter.

In systems where the offset parameter, Emin and Emax rapidly converge, the

methodology allows for a faster calculation of free energy differences than XTI. In this work, a more challenging system was being studied and thus a methodology for ensuring an optimal parameter search had to be found. The main reasons that lead to a difficult parameter search were the size of the system, its larger quantity of intrinsic degrees of freedom compared to the systems of study in the original AEDS paper and the relative low stability of the α and β structures under perturbation.

3.2.2 Production run α-pose

After obtaining the first results from the β-pose simulations it was concluded that further parameter search simulations were needed before the produc-tion run started, since convergence of the energy parameters had not been achieved. For the α-pose, the values were read from the existing starting configuration obtained from a previous parameter search by L. Egger. This simulation showed evident issues with our system of study: the energy off-set parameter had not reached adequate convergence during the parameter search run and thus the AEDS Hamiltonian was not optimally constructed. This can be seen when analyzing the state coverage of the production run, displayed in figure3.10, which only shows 6 state transitions during the 20 nanosecond production run. In addition, the simulation displayed an in-version of configuration of the system from an α-pose to a β-pose, as can

(43)

3.2. AEDS on α and β poses 29 be seen in figure3.11where after 2 nanoseconds the testosterone ligand de-taches from the active site and rotates to a β-pose when docked back into the reactive center of the enzyme. This behavior was not expected since the S72I mutation reported α hydroxylation on the C16 of testosterone and it is attributed to the use of a wrong offset and Emin value and a possible lack of restrains on the system. The use of restrains in the form of harmonic po-tentials to enforce a desired pose become an option when the configuration needed during the simulation is not stable enough. The restrains are spec-ified before the beginning of a simulation run and they can be applied to bonds or to pairs of atoms to apply a large penalty to the movement of those restricted atoms.

Thorough production runs for the α and β-poses with converged energy pa-rameters were not conducted during this work due to lack of time. It was thought that, given the importance of choosing correct energy parameters for the production run and the impact that they have on the outcome of the simulations, a protocol to guarantee an optimal choice of these values was a priority. Nevertheless, the failed production run for the α-pose reported important issues that were relevant for understanding the impact of the pa-rameter search for the system of study. In the future, production runs with converged values for Emin, Emax and Eo f f will be prepared with the addition

(44)

30 Chapter 3. Results and discussion

FIGURE3.5: Plots of the Emin, Emaxand Eo f f obtained during the

20 nanoseconds parameter search simulations for the β-pose system. The graphs do not show convergence of the Eo f f

to-wards a specific value but they oscillate without a defined limit. The Emin parameter does not reach a stable value in the cases

of σ = 2, 3, which evidences the need for a better parameter search.

(45)

3.2. AEDS on α and β poses 31

FIGURE3.6: Distance analysis of the α (in black) and

H16-β(in red) during the parameter search simulation for the three different values of σ. The distance between the given hydrogen and the iron center of the heme cofactor determines if the sys-tem maintains the original β-pose. The gaps in the graphs are the result of corrupted simulation files that were not written

(46)

32 Chapter 3. Results and discussion

FIGURE3.7: Snapshot of the final configuration of the BM3

mu-tant on the 2σ simulation. The distance between H16-β (orange ball) and the iron on the heme cofactor is displayed as well as the hydrogen bond between serine 72 and the testosterone

(47)

3.2. AEDS on α and β poses 33

FIGURE 3.8: Parameter search simulations for the serine to

isoleucine mutation in water. The decay time has been modi-fied and risen 100 ps every 10 ns. Values for the three required parameters show convergence during the first 40 nanoseconds. It can be seen how Emax remains almost constant in the three

simulations while Emingoes up with σ and the Eo f f decreases

(48)

34 Chapter 3. Results and discussion

FIGURE3.9: Graphs for Emin, Emaxand Eo f f obtained in the

pa-rameter search simulation for the β-pose with increasing decay time at different values of σ. The structures maintained a β-pose during the simulated time (not shown). Even though the simulations are not finished, the tendency towards a converged value of the energy offset can be seen in all three graphs. This improvement in the results for the parameter search can be seen clearly if the graphs are compared with the ones in figure3.5.

(49)

3.2. AEDS on α and β poses 35

FIGURE 3.10: State coverage of the production AEDS run for

the α-pose. It is clear that during most part of the simulation only one end state is being visited, and thus the sampling ac-quired is inadequate for the correct analysis of the system’s

properties.

FIGURE 3.11: Distance analysis between α (black) and β (red)

hydrogens and the iron of the heme cofactor for the AEDS pro-duction run. The fact that the initial configuration is not main-tained during the production run shows how the system’s en-ergy landscape has been inappropriately constructed and how additional constrains on the geometry of the system may be

(50)

Referenties

GERELATEERDE DOCUMENTEN

To derive these numbers, we will consider the 16 possible ( 2, 2 ) -tilings with given boundary spins on the left and right side and calculate the number of feasible configurations

Addressing the themes of the female body, pain, grief and violence, the quoted poetry of Warsan Shire recognizes the pain women endure and gives this group a voice, while

interaction of the dye with oxygen (quenching) is competing with an intramolecular decay pathway. S10 and Table S6, ESI†), in the presence of oxygen, shows similar behaviour as

Multi-level&amp;actor internationale normstelling met grote feitelijke impact ( innovation tool ), ook juridisch gezag?.. RTI –

y-grOCIIO &lt;laarin sal slang om die poli ticl'c mondstuk ntn die g corganisccr- do \'alwnies tc word ttic... kels deur di e strate

Maar wat het verkeersaanbod betreft lijkt er in Noord-Holland slechts weinig verschil tussen die twee weekendnachten te bestaan, zodat er in de vrijdagnacht ook in

Please download the latest available software version for your OS/Hardware combination. Internet access may be required for certain features. Local and/or long-distance telephone

Most similarities between the RiHG and the three foreign tools can be found in the first and second moment of decision about the perpetrator and the violent incident