Exploring unwrapping states of nucleosomal DNA

(1)

nucleosomal

DNA

THESIS

submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

in

PHYSICS

Author : K. van Deelen

Student ID : s1282220

Supervisor : prof. dr. H. Schiessel

2ndcorrector : prof. dr. L. Giomi

(2)

(3)

nucleosomal

DNA

K. van Deelen

Huygens-Kamerlingh Onnes Laboratory, Leiden University P.O. Box 9500, 2300 RA Leiden, The Netherlands

April 23, 2020

Abstract

This study uses the rigid base pair model (rbp) and Markov Chain Monte

Carlo (MCMCs) to simulate the unwrapping of nucleosome core particles

(NCPs). The model is sequence dependent and is used to research the bias

in left or right unwrapping and the effect of weakening the nucleosome

bindings for severalDNAsequences (Widom-601, sea urchin 5S gene and

601-derivatives). We are able to focus on intermediate stages in unwrapping, while these may not always be visible in experiments. We

validate the model by comparing model outcomes to experimental results and we propose a (simple) method to find interesting sequences

(4)

(5)

1 Introduction 7

2 Methods 9

2.1 Models 9

2.2 Metropolis Algorithm 10

2.3 Unwrapping theDNAmolecule 12

3 Practicalities and formulae 15

3.1 Measurements and correction 15

3.2 Binding to a nucleosome: adsorption energy 15

3.3 Boltzmann weights of unwrapping states 16

3.4 Access probability 17

3.5 Probability of unwrapping 18

4 Results and Interpretation 19

4.1 Elastic energy of a DNA molecule 19

4.2 Total energy of an unwrapping state 21

4.3 Total energy and relative probability of all unwrapping states 26

4.4 Unwrap distribution 31 4.5 Access probability 33 4.6 Probability of unwrapping 36 5 Conclusion 43 Appendices 49 A Sequence information 51

A.1 Sequences used 51

(6)

B Unwrapping states of all studied sequences 57

B.1 Total energy and relative probability 57

B.2 Cumulative probability and relative asymmetry in

unwrap-ping 71 C Simulation details 73 C.1 Equilibration 73 C.2 Correlation 75 C.3 Simulated Annealing 75 C.4 Error Analysis 76

(7)

Chapter

1

Introduction

DNA can be a very long chain of hundreds of millions of nucleotides and

has to be wrapped up very neatly to fit inside a nucleus, as it is usually in the order of decimeters and has to fit in a sphere in the order of

mi-crometers. To achieve this highly packed form, DNA is wrapped in

sev-eral hierarchical structures [1]. At the lowest level of compactification the

DNAis part of fundamental units called nucleosome core particles (NCPs),

which consist of around 147 base pairs of double strandedDNA wrapped

in about 13_/₄_{turns in a superhelical configuration around a protein core.}

This configuration requires the DNA to bend quite a lot and may induce

deformations from its ideal double helical structure [2]. TheDNAcontains

a sequence of base pairs, which in order influences the mechanical proper-ties of the double helix, such that very rigid or very flexible sections may

occur. Very rigid sections ofDNAwill very unlikely wrap around a histone

core, as opposed to more flexible sections that do [3]. The induced bend-ing also results in a certain sequence affinity for nucleosome positionbend-ing [4].

DNA is known to spontaneously unwrap and rewrap small sections

— possibly for gene expression [4, 5] — due to thermal fluctuations, but sometimes a large section needs to unwrap in order for some sites in the

buried nucleosomal DNA to become accessible. This unwrapping could

happen symmetrically, from both sides with roughly equal probability [6], but some sequences are known to unwrap very asymmetrically, such as the widely studied artificial Widom-601 sequence, and the natural occur-ring sea urchin 5S gene [3, 7, 8], which could have implications for di-rectionality of transcription. However, in order to unwrap large sections

of DNAit requires the bindings between the histone core and the DNA to

(8)

un-derstanding of DNA binding to the protein core: how can it be that gene

expression occurs whileDNA spends most it time being mostly wrapped

and how can it unwrap in large sections; how does this depend on the

DNAsequence and how is it influenced by the weakening of the bindings

between theDNAand the histone core?

We suggest this process is heavily influenced by the mechanical infor-mation of the sequence — as it influences the affinity to wrap around a histone core, and it may determine the bias in unwrapping — and by the concentration of counter-ions in the environment. Previous work using the rigid base pair model (rbp) in a nucleosome model has recovered ba-sic nucleosome positioning rules [10]; explored nucleosome unwrapping through induced force [9, 11]; and sequence dependency of nucleosome breathing by looking at the accessibility for restriction enzymes at certain

sites in the nucleosomal DNA [12]. Previous use of the model already

explored some aspects of nucleosome unwrapping, but did not take the

binding strength of theDNAto the histone core into account.

Much experimental work has been done to understand how NCPs

un-wrap, but we will mainly focus on the results by Mauney et al., where

several stages in the unwrapping of DNA sequences Widom-601 and the

sea-urchin 5S gene are found [13]. Like in previous experimental work [14, 15], the nucleosome binding strength is weakened by adding a salt

concentration, introducing billions of Na+-ions. The increase in salt

con-centration gives rise to an increased number of structures that are highly unwrapped. We will map this salt concentration to the theoretical

adsorp-tion energy required for theDNAto bind to the histone core. In this report

we will show at what level our model is able to reproduce experimental re-sults and develop several analyses that can suggest interesting sequences for further experimental research.

(9)

Chapter

2

Methods

2.1 Models

To simulate a (partially) wrappedDNA molecule we use a rigid base pair

model (rbp). We assume theDNAmolecule behaves like a polymer chain,

consisting of linked rigid plates which represent the base pairs. Each base pair is connected to its previous and next neighbours in the chain and has six degrees of freedom: it can translate and rotate in three dimensions. The actual nucleosome is not explicitly constructed, but implicated by forcing the polymer chain in a superhelical configuration. In our nucleosome a set of 28 constraints is enforced, corresponding to the binding sites where

theDNAmolecule is bound to the nucleosome, where the minor groove of

the double helix faces inwards. At each of the fourteen binding sites there are two constraints (see Fig. 2.1). Each constraint fixes the location and orientation of the mid-frame between consecutive base pairs, called a base

pair step — corresponding to the phosphate groups of theDNAbackbone

— which has to stay in the same location and orientation in this model. Even with these constraints, there will be an enormous amount of pos-sible configurations of the molecule, but not all are as likely to occur. For

DNA to be wrapped in a nucleosome structure, it has to be sharply bent

and twisted, and this induces deviations from the ideal super helical struc-ture. In our model we assume these deviations induce a quadratic defor-mation energy. The model takes only nearest-neighbour interactions into account, in which case the deformation energy for a base pair step (two consecutive base pairs) is given as:

E= 1

(10)

Figure 2.1:Visualization of the rigid polymer chain ’wrapped’ around the histone core. The green plates are the base pairs of the DNA sequence (rbp model). The sequence is bound to the histone core at the binding sites, represented as dots (nucleosome model). Source: [12]

where q is a 6-dimensional vector containing the spatial and rotational

de-grees of freedom of a base pair step, q0 its intrinsic, preferred values, and

K is the (6 by 6) ‘stiffness’ matrix, coupling the interaction between two

base pairs. q0 and K are different for each base pair step, resulting in the

sequence dependency of the model. Each base pair step has its own

in-trinsic values q0and stiffness parameters K, which are fully parametrized

in the literature [16, 17]. In order to get likely configurations of the DNA

molecule, we use the Metropolis Algorithm to generate a probability den-sity function of possible configurations, from which we then sample.

2.2 Metropolis Algorithm

To generate random configurations of a wrapped DNA molecule we use

the Metropolis algorithm. The Metropolis algorithm is a Markov Chain Monte Carlo method to generate a sequence of random samples from an unknown or difficult to calculate probability distribution. The initial state is the configuration where the molecule is forced into a superhelical

(11)

struc-whether this base pair is part of a binding site or not. Unbound base pairs can perform every move (change location and orientation), but bound base pairs can only make moves that ensure that their mid-frame keeps the same location and orientation, as the mid-frame corresponds to the

phos-phate group that is bound to theDNAbackbone.

After every base pair step has had a chance to make a move , this will have constructed a new configuration of the molecule, for which again the deformation energy can be calculated. This new configuration has a chance to be accepted, forming the next state in the Markov Chain, or rejected, and the current configuration is again used to create the next. Whether the new configuration is accepted or rejected depends on the new deformation energy. If the energy of the new sample is higher than the previous one, it is most likely rejected, as it would be unlikely that the system will propagate to that state. If the energy is lower the new sample is always accepted. The acceptance rate α depends on the difference in

energy∆E and the sampling factor β=1/kBTr, where kBis the Boltzmann

constant, and Tr room temperature:

α =

(₁ _if∆E_<₀

e−β∆E _if∆E_>₀ (2.2)

where∆E=Enew−Eprevious.

If we do this with a large enough amount of steps, the Markov Chain will eventually converge to a probability distribution of configurations of

theDNA molecule. We can now sample from this distribution

(‘measure-ment’ step), and do this multiple times to make sure we get sufficiently accurate estimation of the energy. The sequential samples in the Markov Chain are highly correlated, as each next sample depends on the previous one, but we can simply generate additional samples between ‘measure-ment’ steps to lower the correlation.

If we sample at a high temperature T =1= Trwe get highly

fluctuat-ing measurements, with a high standard deviation. To decrease the stan-dard deviation of the measured energies, we use a high sampling factor

β = 1000. Simulated annealing is used to ensure the Markov Chain will

converge around the state with a global minimum in deformation energy. This means the system will slowly get to equilibrium with a large number

of steps, each time decreasing the sampling temperature T = 1/β,

start-ing at T = 1 (room temperature) until T = 1/1000. For more details on

(12)

binding site first base pair step second base pair step 1 2, 3 6, 7 2 14, 15 17, 18 3 24, 25 29, 30 4 34, 35 38, 39 5 45, 46 49, 50 6 55, 56 59, 60 7 65, 66 69, 70 8 76, 77 80, 81 9 86, 87 90, 91 10 96, 97 100, 101 11 107, 108 111, 112 12 116, 117 121, 122 13 128, 129 131, 132 14 139, 140 143, 144

Table 2.1:Base pair indices for the first and second base pair step of each binding site. The total length of the sequence is 147 base pairs with indices from 0,...,146.

2.3 Unwrapping the

DNA

molecule

The DNA molecule is ‘wrapped’ due to 28 enforced constraints,

corre-sponding to the 14 binding sites where the DNA is bound to the histone

core. These constraints are imposed on the base pairs where the histone

side chains would be bound to the phosphate groups of the DNA

back-bone. These phosphate groups are between consecutive base pairs, which we will call a base pair step. A binding site consists of two base pair steps and their locations are derived from crystal structures, and reproduced by the nucleosome model [10], see Table 2.1. We will call base pair steps be-longing to a binding site ’bound’, and those not bebe-longing to a binding site ’unbound’.

(13)

Figure 2.2:Visualization of unwrapping a sequence by opening binding sites. At a) four binding sites from the left are open (i=4) and one from the right (j =1). This means that sites 1 ≤ k ≤ 4 and k = 14 are accessible, and e.g. k = 7 is not. At b) eight binding sites are open from the left, meaning that site k = 7 now is accessible. Source: [12]

To unwrap a DNA molecule the constraints of a binding site are

re-voked, allowing more degrees of freedom for the previously bound base pairs, and leading to the relaxation of the then unwrapped section of the

DNA molecule (see Figure 2.2). An unbound base pair step has 6 degrees

of freedom for each base pair (3 translational and 3 rotational); a bound base pair step has a fixed mid-frame and thus less degrees of freedom: 3 for each base pair. We will consider ‘unwrapping states’ where we open a certain amount of binding sites from the left end (5’-end) and from the

right end (3’-end). We assume that opening a binding site 1 ≤k ≤ 14 can

only occur if all binding sites between k and the left end or right end are also open. We call the deformation energy — including the energy freed

by opening the binding sites (see Section 3.2) — of such a state Eij, with i

(14)

(15)

Chapter

3

Practicalities and formulae

3.1 Measurements and correction

Our model gives us the free energy of the configuration of theDNAmolecule.

To get the elastic energy of unwrapping state(i, j), we need to correct for

entropy, using the equipartition theorem:

Eelastic(i, j) =Emodel(i, j) −Eentropy =Emodel(i, j) −Eequipartition(i, j) (3.1)

Eequipartition(i, j) =

1

2β ·q =

1

2β ·6· (147− (28−2(i+j))) (3.2)

with β = 1/kBT the sampling factor and q the total degrees of freedom,

which depend on the total number of ‘free’ base pairs. For a completely

wrapped state (i = j =0) we have 147−28 free base pairs, as we have 14

binding sites that each fixes 2 base pair steps, consisting of 2 base pairs, in place and orientation. In Section 2.3 we have seen that each base pair step bound the degrees of freedom for each base pair is reduced by 3. Thus for

opening(i+j) binding sites we gain 3·2·2· (i+j) =6·2(i+j)degrees

of freedom.

3.2 Binding to a nucleosome: adsorption energy

To get the total energy of an unwrapping state, we have to simulate the

effect of theDNA molecule bound in a nucleosome. In the rbp model this

is done by enforcing certain constraints, but we also need to take the en-ergy required to open these bindings into account. We do this by adding

(16)

it to the elastic energy acquired from the rbp model. We can say that re-leasing a binding site reduces the total energy of the system equal to the adsorption energy that was required to bind the base pairs to the histone core. We make the assumption that this adsorption energy is equal for ev-ery binding site: that the adsorption energy distribution is uniform. Then

for an unwrapping state(i, j)the total energy is given by:

Etot(i, j) = Eelastic− (14−i−j) ·Eads (3.3)

where Eads is the adsorption energy (in units kTr). This total energy

de-creases when we open (i+j) binding sites. We will call the total energy

for an unwrapping state, Etot(i, j), from now on Eij.

The adsorption energy plays a big role in how a sequence unwraps from the nucleosome, as the total energy of an unwrapping state is given by the difference between the energy gained by becoming more straight (and less bent and strained), and the energy it costs to open a binding site. It is theorized that the total adsorption energy should be slightly higher

than the total elastic energy of theDNAsequence that is required to bend

it in a super helical configuration [18]. Our simulation predicts that the

bending energy for a fully wrapped configuration to be around 65 kTr

-which is near predicted values [18], so we expect the total adsorption

en-ergy to be the same, but slightly exceeding it — by 1 to 2 kTr per binding

site — as it should be possible for the sequence to spontaneously unwrap and rewrap so it may become accessible temporarily. This gives us a total

adsorption energy in the range 79 – 93 kTr. If we assume that the binding

energy distribution is uniform, we get per binding site Eads ≈5.5 – 6.5 kTr.

By lowering the adsorption energy we can simulate the effect of weak-ening the nucleosome binding strength. In experiments, this is done for example by increasing the salt concentration around the nucleosome to

decrease the binding strength between the phosphate group of the DNA

and the core [13, 14, 19]. We will take Eads = 6.5 kTr as a maximum value

for the adsorption energy per binding site, and expect theDNAmolecule to

be fully/mostly wrapped. We expect for Eads <5.5 kTr theDNA molecule

to be partially unwrapped, and for Eads < 4.5 kTr mostly unwrapped, as

(theoretically) the adsorption energy does not exceed the elastic energy at that point.

(17)

probability we use Boltzmann weights:

Cij =e−βEij =e−Eij (3.4)

The last step β =1 is taken as in our case we measure the deformation

energy of the configuration of the DNA molecule at equilibrium, at room

temperature Tr. In our simulation β(T) = 1/Tis given in terms of the

unit-lessT_/_T_r_{, so at room temperature we have β}₍_T_r_{) =} _1.

The partition function for this system is then given by:

Z=

i+j<14

∑

i,j≥0

Cij (3.5)

as we have the upper bound of one binding site remaining closed (i+j =

13 < 14), as otherwise we just have free DNA. With this, we can look at

what the relative probability is that a sequence unwraps a certain amount of base pairs, corresponding to the amount of open binding sites. For this we will first use the so-called access probability and then we will look at the relative probability of unwrapping.

3.4 Access probability

Access probability P(k) is the chance that binding site k ∈ {1, ..., 14} is

accessible (to for example an enzyme). In our case we assume this is the case when binding site k is open, as we are only interested in how the sequence unwraps from the nucleosome, so we assume that there is no steric accessibility required (additional binding sites that should be open) [12]. Site k is accessible when enough binding sites are open; this could happen from the left or from the right (or both), such that:

P(k) = 1 Z i+j<14

∑

i≥k Cij+ i+j<14

∑

j>14−k Cij ! (3.6)

as site k is only accessible when from either side at least all sites up to site

k are open. This P(k) gives us a nice accessibility ‘profile’, that can tell

use something about the general ease of unwrapping and whether there is a bias in unwrapping: if it unwraps more easily from the left (5’-end) or from the right (3’-end).

(18)

3.5 Probability of unwrapping

Each Boltzmann weight corresponds to a relative probability that

unwrap-ping state(i, j)occurs. Using these weights we can calculate the

probabil-ity of n open binding sites:

Punwrap(n) = 1 Z 0≤i+j≤14

∑

i+j=n Cij ! (3.7)

Note that here we allow the fully unwrapped state i+j = 14. We can

also calculate whether this unwrapping occurs relatively more from the left than from the right:

RAunwrap(n) = 1 Z 0≤i+j≤14

∑

i>j (+Cij) + 0≤i+j≤14

∑

i<j (−Cij) ! (3.8) which we will call the relative asymmetry in unwrapping; it being pos-itive indicates a left bias in unwrapping, and when negative a right bias in unwrapping. We can relate the amount of open binding sites n to the

(19)

Chapter

4

Results and Interpretation

First we will focus on theDNAmolecule when it is fully wrapped (all

bind-ing sites are closed), and then analyze the total energy of unwrappbind-ing state

(i, j). As a start we will look at what happens if we ‘unwrap’ the DNA in

stages from either the left or the right and the influence of the adsorption

energy Eads, after which we will look at all unwrapping states, the total

en-ergy ‘landscape’ of aDNA sequence. Then, using the Boltzmann weights

Cij, we can look at how the molecule will most likely unwrap and we see

whether there is a clear bias.

4.1 Elastic energy of a DNA molecule

We begin by analyzing the elastic energy (in units kTr) of a completely

wrapped nucleosome — in other words, when all binding sites are closed.

Here we only look at the elastic energy of theDNAmolecule, so no

adsorp-tion energy of the binding sites has been taken into account. The elastic energy alone can tell us where most of the elastic energy is stored, which

is directly correlated to where theDNAmolecule is most forced to deviate

from its ideal helical configuration.

Figure 4.1 shows the elastic energy stored between base pairs for three sequences: the theoretical fabricated uniform sequence ‘X’, the Widom-601 sequence and the sea urchin 5S gene. The uniform X sequence consists of base pairs of X nucleotides. The X base pair step has ‘stiffness’ parameters that are the arithmetic average of all the (natural) base pair steps. In the figure the position of the binding sites is also shown: black and red dashed lines indicate the base pair indices per binding sites (upper axis).

(20)

2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Ela sti c e ne rg y ( kTr ) sequence X 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14

(a)uniform X sequence

2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Ela sti c e ne rg y ( kTr ) sequence 601 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14

(b)601 sequence

2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Ela sti c e ne rg y ( kTr ) sequence 5S 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14

(c)5S sequence

Figure 4.1:Elastic energy (kTr) per base pair step of a completely wrapped

nucle-osome for the uniform X, 601 and 5S sequence. The dyad (middle of the sequence) is indicated by a grey dashed line at base pair index 73. The black and red dashed lines are located at the base pair indices of each binding site (see binding site index in the upper axis).

(21)

The elastic energy profile for the uniform X sequence is very symmet-rical, which is easily seen when looking at the location and height of the peaks in Figure 4.1a, which is what we might expect for a completely uni-form and self-palindromic sequence. The elastic energy profile for the 601 sequence is however asymmetrical: there are more high peaks in elastic energy to the right of the dyad than to the left, with the highest peak be-ing at bindbe-ing site index 11. This means that most of the elastic energy is stored to the right of the dyad. From this alone we might expect that the 601 sequence will unwrap more from the right, as there is more

en-ergy stored in the DNAat binding sites to the right of the dyad, and thus

more energy to be gained when opening those binding sites. For the 5S se-quence the elastic energy per binding site also is asymmetric with respect to the dyad, but the comparable differences per binding site are small (see

k =4 and k =11; k=5 and k =10; k =6 and k=9). We will later see if it

is possible to get a better indication for asymmetric unwrapping (see

Fig-ure 4.7). Next we will look at the total energy of theDNAmolecule — we

now include the theoretical adsorption energy per binding site required

for theDNAsequence to be bound to the histone core — for several stages

in unwrapping. We call this energy the total energy of an unwrapping state.

4.2 Total energy of an unwrapping state

The total energy of an unwrapping state is the (corrected) deformation en-ergy from the simulation, subtracted by the adsorption enen-ergy of any open binding sites (see Equation (3.3)). We can first look at how the total energy

changes when unwrapping only from the left (j = 0) or the right (i = 0),

for the uniform X sequence and the 601 sequence. This is shown in terms of the cumulative energy cost: how much energy is required to open more and more binding sites, or in other words, unwrap further and further. We expect the energy cost of opening a binding site to be uniform for the uniform X sequence, and thus the cumulative energy cost of unwrapping to steadily change with the amount of open binding sites; and we expect there to be no difference in unwrapping from the left or from the right. We compare this with the 601 sequence, which we do expect to have an asymmetry in unwrapping.

Figure 4.2 shows the cumulative energy cost of opening n binding sites from only the left or the right. We can already see that generally the cumu-lative energy cost increases when more binding sites are opened. The cost for opening the first and the last binding site is especially high: around

(22)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n

0 5 10 15 20 25 30 Cu m ula tiv e e ne rg y c os t ( kTr ) Unwrapping from left right

(a)uniform X sequence

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0 5 10 15 20 25 30 Cu m ula tiv e e ne rg y c os t ( kTr ) Unwrapping from left right (b)601 sequence

Figure 4.2: Cumulative energy cost of unwrapping the uniform X sequence and 601 sequence from only the left or right for Eads= 6.5 kTr.

5 kTr. This may be explained by theDNAalready being mostly straight at

the ends, which means that the elastic energy gained by relaxing is very small, and thus the energy cost for opening the binding site is high. For the uniform X sequence we can see no difference between unwrapping between the left and the right and the cumulative energy cost increases rather steadily, but not completely linearly as we might expect. This could

be due to theDNAmolecule not forming a perfect superhelix when bound,

leading to places in the molecule with weaker (and stronger) curvature, leading to steeper (and less steep) increase in the cumulative energy cost.

For the 601 sequence unwrapping from the left and the right is very

similar at the start (n<4) and at the end (n >10), but there is a large

dif-ference near the center of the sequence (4≤n≤10). We can see that if we

unwrap from the right the cumulative energy cost decreases after 3 binding

sites are opened (n = 3), which suggests that after opening binding site

k = 3 it is very likely that up binding site k = 6 will also be open, as it

costs the same amount of energy. In general however, the sequence will most likely stay wrapped, since the resulting energy change from opening binding sites is overall highly positive. This changes when we lower the adsorption energy, as we can see in Figure 4.3.

(23)

0 2 4 6 8 10 12 14 16 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 6.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 6.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 4 3 2 1 0 1 2 3 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 5.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 2 0 2 4 6 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 5.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n

15.0 12.5 10.0 7.5 5.0 2.5 0.0 2.5 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 4.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n

12 10 8 6 4 2 0 2 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 4.0 kTr Unwrapping from left right

Figure 4.3: Cumulative energy cost of unwrapping sequences X (left) and 601 (right) from only the left or right end when lowering the adsorption energy to Eads= 6.0 kTr(upper), 5.0 kTr(middle), 4.0 kTr(lower).

(24)

In Figure 4.3 states in unwrapping from only the left or right for de-creasing values of adsorption energy are shown. We can see that the

ad-sorption energy Eadsheavily influences the cumulative energy cost of

un-wrapping: we can see that for high adsorption energy (Eads > 5 kTr) the

cost generally increases with the amount of open binding sites, while for

low adsorption energy (Eads < 5 kTr) the cost decreases. Note that for all

values of Eadsthe difference in unwrapping from the left and from the right

for the 601 sequence is present, and for the uniform X sequence is absent.

We can also see that for Eads < 5 kTr the cost is generally negative, which

indicates that the DNA molecule gains energy by unwrapping, so at this

point we will most likely find fully unwrapped states. This information is condensed in Figure 4.4.

Figure 4.4 shows the cumulative energy cost for the uniform X

se-quence and the 601 sese-quence for several values of Eads, indicated by the

number at the end of each line plot. Whether the sequence favors to un-wrap from the left or from the right depends on where the cumulative energy cost of the two sides intersect, and is indicated by a blue, orange or

black coloured intersection cross ‘×’. Blue indicates a right bias, orange a

left bias, and black indicates a lack of bias. The sequence favors to unwrap from the side where the cumulative energy cost is lower, thus the bias is to-wards the side for which the cost decreases when unwrapping even more (after the intersection). In short: when following the lines towards the middle, whichever has a lower energy indicates the bias. For the uniform sequence X we see a highly symmetrical cumulative energy cost in unwrapping, and

see several black crosses, indicating no bias. For Eads =5.5 kTrwe can see a

couple of blue and orange crosses that are very close together, so we inter-pret this as no bias. For the 601 sequence we see several blue intersection crosses, indicating that the 601 sequence has a right bias in unwrapping. This method can show us at first glance if the sequence has a bias in un-wrapping, but this does not tell us what happens at intermediate stages of unwrapping, or when both ends may unwrap. To get a complete picture, we visualize the unwrapping states in the next section (Fig. 4.5).

(25)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n from the left

15 10 5 0 5 10 15 20 Cu m ula tiv e e ne rg y c os t ( kTr ) 6.5 6.5 6.0 6.0 5.5 5.5 5.0 5.0 4.5 4.5 4.0 4.0 left right

14 13 12 11 amount of open binding sites n from the right10 9 8 7 6 5 4 3 2 1 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n from the left

10 5 0 5 10 15 20 25 Cu m ula tiv e e ne rg y c os t ( kTr ) 6.5 6.5 6.0 6.0 5.5 5.5 5.0 5.0 4.5 4.5 4.0 4.0 left right

14 13 12 11 amount of open binding sites n from the right10 9 8 7 6 5 4 3 2 1 0

Figure 4.4: Cumulative energy cost of unwrapping the uniform X sequence (up-per) and the 601 sequence (lower) from only the left or right for several Eads. The

number at each line indicates the value of the adsorption energy used (in kTr).

The middle of the sequence is indicated by a dashed grey line at n=7. Unwrap-ping from the right is mirrored and follows the upper x-axis (blue). Intersections are noted by a ‘×’, coloured orange for a left bias, blue for a right bias or black for no bias.

(26)

4.3 Total energy and relative probability of all

unwrapping states

We will now look at all unwrapping states of the nucleosome. The total energy for every unwrapping state of the 601 sequence is given in Figure 4.5. When we lower the adsorption energy the total energy increases over-all, as less energy is stored in the closed binding sites, and it costs less energy to open them. However, the energy increase happens

asymmetri-cally with respect to kleft and kright and we can see a clear bias: the total

energy mainly increases for states where kleft > kright. But this

asymme-try is more pronounced when we look at the relative probability of these

unwrapping states, using the Boltzmann weights Cij.

We normalize the Boltzmann weights by the partition function Z to get

a relative probabilityCij/Z that a DNAmolecule will be found in a certain

unwrapping state (see Figure 4.6). When we lower the adsorption energy the relative probability to be in that unwrapping state increases, as the energy of that state decreases. This does not happen symmetrically with

respect to kleftand krightand we again see the same bias: the relative

prob-ability mainly increases for kright > kleft (below grey line), indicating a

preference in unwrapping from the right. Furthermore, we can see when

lowering the adsorption energy especially C0,5 sharply increases,

indicat-ing that all at once, about 5 bindindicat-ing sites will open from the right. When we lower the adsorption energy even further the relative probability

in-creases for all states (i,j), but mainly for unwrapping states where kright ≥5

and kright > kleft. The Appendix contains the total energy and relative

Boltzmann weights of all unwrapping states for all studied sequences: the uniform X sequence, 601 sequence, 5S gene and 601 derivatives 601MF, 601RTA, 601L in Figures B.1 to B.12.

(27)

0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 6.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 6.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 5.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 5.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 4.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 4.0 kTr

25

20

15

10

5

0

5

10 E

to

t

(k

T

r

)

Figure 4.5: Total energy for all unwrapping states of the 601 sequence for sev-eral values of adsorption energy, where kleft indicates the amount of open

bind-ing sites from the left and kright from the right. The gray line indicates where

(28)

0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 6.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 6.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 5.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 5.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 4.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314

k

right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

k

left Eads = 4.0 kTr 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 100

C

ij

/

Z

Figure 4.6: Relative probability Cij/Z for all unwrapping states of the 601

se-quence for several values of adsorption energy, where kleftindicates the amount

of open binding sites from the left and krightfrom the right. The gray line is plotted

(29)

We can see the effect that the outer ‘arms’ have on unwrapping the

se-quence by looking at the energy difference E−ET or relative Boltzmann

weights C_/_CT_{; E}T _{and C}T _{being the transpose of E respectively C. The}

transpose means in this case the energy, respectively the relative

probabil-ity of the symmetrically flippedDNAsequences (going from 3’ to 5’ instead

from 5’ to 3’). The energy and probability differences do not depend on the

adsorption energy and give us an idea what the influence of the DNA

se-quence is in unwrapping. In areas where(E−ET)ij ≈0 or(C/CT)ij ≈100

— coloured green in Figure 4.7 — parts of theDNAsequence left and right

to the dyad do not influence the relative probability of unwrapping signif-icantly. This is the case for the uniform X sequence (Figures 4.7a and 4.7b). As expected for the uniform sequence X we see almost no energy dif-ference (check the scale), and consequently no difdif-ference in the probability

differenceC_/_CT_{. For the 601 sequence however, we can see several (4!)}

or-ders of magnitude difference in the probability difference C_/_CT _between

the areas where kleft is dominant (above grey dashed line), neutral (where

C_/_CT _≈ ₁₀0_{), and k}

right dominant (below grey dashed line). This means

that when opening e.g. 7 binding sites it is ≈ 104 times more likely that

it happens from the right (3’-end) than from the left (5’-end), indicating a

large right bias in unwrapping. Note that k=7, 8 is the binding site in the

centre of the nucleosome, meaning the bias is largest when unwrapping

almost half the DNA. Also note that for very small values k_left , k_right ≤ 3

and for very high values kleft, kright≥12 the difference is insignificant. We

will later use this method to compare the effect of the outer arms of other

(30)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 0.06 0.04 0.02 0.00 0.02 0.04 0.06 E E T (k Tr )

(a)Energy difference E−ET

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 101 100 101 C/ C T (b)probability differenceC_/_CT 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 6 4 2 0 2 4 6 E E T (k Tr ) (c)Energy difference E−ET 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 104 103 102 101 100 101 102 103 104 C/ C T (d)probability differenceC_/_CT 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 3 2 1 0 1 2 3 E E T (k Tr )

(e)Energy difference E−ET

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 102 101 100 101 102 C/ C T (f)probability differenceC_/_CT

Figure 4.7: Effect of the outer arms on unwrapping for sequences X (upper), 601 (middle) and 5S (lower). The grey line indicates where kle f t = kright. Note the

(31)

4.4 Unwrap distribution

We can use the relative Boltzmann weights Cij/Z to calculate the relative

probability that, for a given adsorption energy, we find a DNA molecule

with a certain amount of open binding sites n (Equation (3.7)). This is done

by summing all relative Boltzmann weights where n = i+j for every n

in 0–14 (so summing over the diagonals perpendicular to the grey dashed line in Figures B.2 and 4.6). This can tell us what impact the adsorption

energy and theDNAsequence information have on unwrapping.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0.0 0.2 0.4 0.6 0.8 1.0 relative probability

Unwrap distribution for Eads = 6.5 kTr

left bias no bias right bias

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 0.6 relative probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 4.8: Bar plot of the relative probability for n open binding sites for the uniform X sequence for several values of adsorption energy. The colour of the bar indicates the bias in unwrapping. Note that the probability scale changes when lowering Eads.

In Figures 4.8 and 4.9 a bar plot of the relative probability for n open

(32)

0.0 0.2 0.4 0.6 0.8 1.0 relative probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 relative probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 relative probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

amount of open binding sites n 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 relative probability

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Figure 4.9:Bar plot of the relative probability for n open binding sites for the 601 sequence for several values of adsorption energy. The colour of the bar indicates the bias in unwrapping. Note that the probability scale changes when lowering Eads.

and the 601 sequence. These bar plots give us a distribution of how likely it is to find a nucleosome with a certain amount of open binding sites n.

We can see that for high Eads(≥6.0 kTr) both sequences are likely

com-pletely wrapped (n = 0), and lowering the adsorption energy generally

increases the probability of more unwrapped states (higher n) occurring. For the uniform sequence X we can see that the left and right bias are very much equal, which is what we expect from a uniform sequence. For

Eads = 5.5 kTr we see that the relative probability for higher n increases

roughly equally for all n (0−0.1), and the relative probability for n =0

de-creases. For Eads <5.5 kTr we see a large decrease of completely wrapped

(33)

have seen earlier that this might be the case due to the fact that the

en-ergy gained from unwrapping is very low, as the DNA is already mostly

straight at that point. The unwrap distribution for the 601 sequence shows some similarities, but we see a large right bias in unwrapping, for every

value of Eads, especially at n = 5. When Eads is very low (4.0 kTr) the bias

more or less disappears for highly unwrapped states (n≥10): the left and

right bias are roughly the same, which we might expect, as at this point

almost allDNAis unwrapped and it behaves mostly as freeDNA. One can

see that best in Figure 4.6: for Eads =4.0 kTr the relative probability shows

peaks for being nearly completely unwrapped from either the right or the

left (kle f tor kright=12 or 13).

4.5 Access probability

Figure 4.10 shows the access probability P(k)for the 601 sequence, the

nat-ural occurring sea urchin 5S gene, and the theoretical uniform X sequence, for several values of adsorption energy. The access probability P(k) is the probability that binding site k is open, and can give us some insight on

how theDNAsequence tends to unwrap.

Let us first focus on the top left figure (where Eads =6.5 kTr). Generally,

the access probability decreases towards the middle of the nucleosome

(k = 7, 8), which indicates that binding sites near the ends (k ≤2, k ≥ 13)

tend to be more accessible than those near the dyad, which is logical as more binding sites need to be opened to reach the middle of the

nucle-osome. P(k) is given by the sum of the Boltzmann weights of all

un-wrapping states where binding site k is open, but mainly depends on the

Boltzmann weights of unwrapping states (i, j) where the least amount of

binding sites are open, and thus mainly from one side (k < 7: left side,

k > 8: right side) [12]. This causes values for P(k) at the ends to heavily

influence the access probability towards the middle, which we can

partic-ularly see for the 601 sequence: P(k)stays relatively very large at the right

end, for k = 14 up towards k = 10, and this causes the minimum of P(k)

to be more to the left (k 6= 7, 8, but k = 6), which indicates a right bias

in unwrapping. As we might expect from an uniform sequence, the ac-cess probability for the uniform X sequence is completely symmetrical in

k, and also decreases towards the middle. P(k) for the 5S sequence looks

very similar to the uniform sequence, but we can see a hint of a left bias in

(34)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

106 105 104 103 102 101 P(k) Eads= 6.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

104 103 102 101 100 P(k) Eads= 6.0 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

103 102 101 100 P(k) Eads= 5.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

101 100 2 × 101 3 × 101 4 × 101 6 × 101 P(k) Eads= 5.0 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

100 5 × 101 6 × 101 7 × 101 8 × 101 9 × 101 P(k) Eads= 4.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k

100 6 × 101 7 × 101 8 × 101 9 × 101 P(k) Eads= 4.0 (kTr) sequence 601 5S X

Figure 4.10:Probability that binding site k is accessible for the 601, 5S and uniform X sequence for several values of Eads(kTr). Binding site k is accessible when (at

least) site k is open. Note the radical decrease in scale.

In Figure 4.10 we can also see the effect of lowering the adsorption en-ergy on the access probability. Generally the access probability increases

for lower adsorption energy, for every binding site. Around Eads ≤5.0 kTr

it is hard to recognize the original profile, and P(k) tends to be roughly

equal for every k, except for at the ends (k=1, 2, 13, 14), which may be (as

noticed before) due to the ends theDNAalready being rather straight, and

it does not gain a lot of (elastic) energy by opening a binding site, so

un-wrapping is rather costly at that point. At Eads ≤ 5.0 kTr most sequences

(35)

To better and more easily see whether a sequence has a bias, we can look at the relative access probability with respect to the (symmetric) uni-form sequence X,

Prel(k) = P(k)/PX(k)

see Figure 4.11. This gives us an indication how symmetric (or asymmet-ric) the access probability of a sequence is, independent of adsorption

en-ergy. We can easily see a right bias for the 601 sequence: Prel(k > 7) > 1

and Prel(k < 7) < 1. Similarly we can see a smaller left bias for the 5S

sequence, but its profile is more symmetric in unwrapping than the 601 sequence, as its curve is more flat.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

open binding site k 10 2 10 1 100 101 102 relative P(k) sequence 601 5S X

Figure 4.11: Relative access probability with respect to the uniform X sequence for the 601 and 5S sequence.

However, the access probability cannot tell us very clearly how (asym-metric) the sequence unwraps in stages. For that we will use the relative probability to open n binding sites and the relative asymmetry.

(36)

4.6 Probability of unwrapping

Using Equations (3.7) and (3.8) we can calculate the probability to open n binding sites, which we can correlate to a number of unwrapped base

pairs nτusing Table 2.1. The unwrap distributions are given in Figure 4.12.

Note that when all binding sites are closed and none are open (n=0), there are still some unbound base pairs at the ends of the nucleosome.

4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.2 0.4 0.6 0.8 1.0 relative probability

left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.2 0.4 0.6 0.8 1.0 relative probability

left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.1 0.2 0.3 0.4 relative probability

left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 relative probability

left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.1 0.2 0.3 0.4 0.5 relative probability

Figure 4.12: Bar plot of the relative probability for nτ base pairs unwrapped for the 601 sequence. A line is drawn through the summed relative probability, with the colour corresponding to the adsorption energy (high: blue; low: red).

(37)

These bar plots are the same as we have seen before (Figure 4.9), but

now with the amount of unwrapped base pairs nτ. We can condense this

information by representing the relative probability for each value of Eads

with a colored line in a single plot (see Figure 4.13) to compare it to experi-mental data from Mauney et al. [13] where the cumulative fraction and rel-ative asymmetry of unwrapping the 601 and 5S sequences are given (see Figures 4.14 and 4.16). The cumulative fraction is the relative probability

for unwrapping up to nτbase pairs and the relative asymmetry is given by

the difference in probability between left dominant (nτ,le f t > nτ,right) and

right dominant (nτ,le f t < nτ,right) unwrapping. Positive and negative

rel-ative asymmetry indicate a bias in unwrapping from the left end (5’-end) and right end (3’-end) respectively.

4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 relative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 relative asymmetry 6.0 5.81 5.58 5.27 4.82 4.0 Eads (k T) 6.0 5.81 5.58 5.27 4.82 4.0 E ad s (k T)

Figure 4.13:Relative unwrap probability and relative asymmetry for nτbase pairs unwrapped for sequence 601 for several values of adsorption energy.

From Figure 4.13 we can acquire the cumulative sum of the relative probability, from now on called the cumulative probability, and relative asymmetry for the 601 and 5S sequences. We compare the cumulative fraction (experimental data) to the cumulative probability (simulations) and their relative asymmetry in Figures 4.14 and 4.16 for the 601 and 5S sequences.

(38)

0 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 1.0 cumulative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 relative asymmetry 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0

(a)(simulation) for several values of adsorption energy Eads(kTr).

(b)(experimental data) for several values of salt concentration (M) .

Figure 4.14:Comparison of the results from our simulations versus experimental results for the 601 sequence. We map the adsorption energy inversely with the salt concentration.

The cumulative fraction / cumulative probability shows the fraction of

DNA samples with a certain amount of unwrapped base pairs. When it

increases sharply, it means that a large amount of the DNA samples will

have that amount of unwrapped base pairs nτ. If the curve is mostly flat

between values of nτ, it means that there are almost none to very few

sam-ples with that amount of unwrapped base pairs. As said before, we map the salt concentration in experiments inversely to the adsorption energy in simulations.

If we look at Fig. 4.14b we see that for a low salt concentration (0.2M,

0.5M) the cumulative fraction increases sharply between 0 ≤ nτ ≤ 20,

which indicates that there are a lot of samples with that amount of

un-wrapped base pairs. After nτ = 20 we see the curve is mostly flat,

in-dicating that there are only a small number of samples with more than 20 bp unwrapped for that salt concentration. Conversely, for high salt

concentration (2M) we see that the curve remains flat for nτ < 120, and

(39)

5’ CTGGAGAATCCCGGTGCCGAGGCCGCTCAATTGGTCGTAGACAGCTCTAGCACCGCTTAAACGCACGTACGCG C

3’ TGTCCTACATATATAGACTGTGCACGGACCTCTGATCCCTCATTAGGGGAACCGCCAATTTTGCGCCCCCTGT

Figure 4.15:Template of the Widom-601 sequence [I strand?], with proposed sites of flexibility (highlighted) and supposed rigidness (underlined), starting from the 5’ side towards to the dyad (‘C’) and ending at the 3’ side.

that at low salt concentration most structures will be mostly wrapped, and at high salt concentrations most structures are mostly unwrapped. Impor-tant to note is that the sequences used in the experiments seems be com-plemented with respect to sequences used in simulations, and the biases are reversed. To eliminate this discrepancy, we flip our relative asymme-try, so now in both figures a positive relative asymmetry indicates a right bias in unwrapping, negative a left bias.

If we compare the experimental results with our predictions, generally we see the same behaviour in the cumulative probability in unwrapping: for high adsorption energy most sequences are wrapped; for low adsorp-tion energy most sequences are mostly unwrapped. We see a striking be-haviour in our predictions: the cumulative probability curve is mostly flat

between 4 ≤ nτ ≤ 37 — indicating few structures with that amount of

bp unwrapped — which decreases when lowering the adsorption energy. At high adsorption energy most structures are mostly wrapped: more

than 80% are fully wrapped, and the remainder up to nτ = 47 bp

un-wrapped. This however changes when lowering the adsorption energy:

the fraction with nτ = 47 bp increases sharply. For very low adsorption

energy (Eads < 5 kTr), this fraction decreases again, and most structures

are mostly unwrapped (nτ >78). This may mean that a section of around

40 bp releases in one go more and more often when lowering the adsorp-tion energy. Mauney et al. report on this same behaviour, coining it the ‘spring loaded-latch mechanism’, which operates on a large section of

rel-atively rigidDNAreleasing in one go. They relate the cause of this

mecha-nism to certain relatively flexible base pair steps, and areas where those are mostly absent, resulting in supposedly large rigid sections. When looking at the sequence in Figure 4.15 they find a large rigid section (underlined) between the 3’-end and the dyad, of around 20 bp long, 30 bp from the 3’-end, and show this mechanism is supported by the relative asymmetry,

as they find a large right bias in unwrapping at nτaround 60–70 bp.

Our simulations predict a similar, but larger right bias in unwrapping

at nτ between 58–68 bp. This difference in magnitude could be explained

by the following: they note a gradual unwrapping until from both ends around 20 bp have been released, while we find no such thing, and predict

(40)

the right side unwrapping in one go, which increases the magnitude in relative asymmetry as it is focused more locally instead of more spread out. Another similarity is that they find this peak in relative asymmetry to increase when increasing the salt concentration, and we also predict a higher relative asymmetry when lowering the adsorption energy.

Even though we do not predict a gradual unwrapping for low nτ, we

do predict a gradual unwrapping for low adsorption energy Eads < 5kTr

when more than half of theDNAhas been released, which is supported by

their results at high salt concentration (>1M).

Now we will compare the cumulative fraction / cumulative probabil-ity and their relative asymmetry for the 5S sequence in Figure 4.16.

0 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 1.0 cumulative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.10 0.05 0.00 0.05 relative asymmetry 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0

(a)(simulation) for several values of adsorption energy.

(b)(experimental data) for several values of salt concentration.

Figure 4.16: Comparison of the results from our model versus experimental re-sults for sequence 5S. We map the adsorption energy inversely with the salt con-centration.

(41)

5’ CTTCCAGGGATTTATAAGCCGATGACGTCATAACATCCCTGACCCTTTAAATAGCTTAACTTTCATCAAGCAA G

3’ TGGCTCGGGATACGACGAACTGAAGCCACTAGCCTGCTCTTGGCCATATAAGTCGTACCATACCAGCATCCGA

Figure 4.17: Template of the sea urchin 5S gene, with proposed sites of flexibil-ity (highlighted) and supposed rigidness (underlined), starting from the 5’-end towards to the dyad (‘G’) and ending at the 3’-end.

For the 5S sequence we see similar trends: for high adsorption en-ergy / low salt concentration most sequences are wrapped, for low ad-sorption energy / high salt concentration most sequences are (mostly) un-wrapped. The difference is however that most experimental samples are already (partially) unwrapped for high adsorption energy / low salt con-centration, which we do not find in our simulations. Our simulations also indicate a far smoother unwrapping than the experimental data: their se-quences mostly unwrap around 40 bp or at least 120 bp (indicated by the

flat lines between nτ = 40 and nτ = 120. We can see a similar increase

in structures between nτ = 37 and nτ = 47 bp unwrapped and after

nτ = 120, but the curves are not mostly flat between those points,

indi-cating a gradual unwrapping. In the relative asymmetry we only see a

(small) left bias in unwrapping at nτ =47. The experimental data shows

two small left bias peaks at nτ = 25 and nτ = 40, which could coincide

with ours, and a larger right bias peak at nτ =50, which we do not predict.

We do however have the relative asymmetry in the same order of magni-tude. By looking at the sequence in Figure 4.17 we can again see supposed rigid sections, about 35 bp away from the 5’- and 3’-ends, of about 5 and 30 bp long respectively. But this time it is not clear whether this explains

the large peak in asymmetry at nτ =47, as this would be in the middle of

(42)

(43)

Chapter

5

Conclusion

We have seen that our nucleosome model can recover the bias in

unwrap-ping nucleosomal DNAand how this depends on the sequence. We have

shown that including the theoretical adsorption energy in the model can

produce different stages in unwrapping nucleosomal DNA and that this

effect differs for each sequence so far simulated.

We have seen that at high adsorption energy most structures will be almost entirely wrapped, and by lowering the adsorption energy we get an increase in structures that are mostly unwrapped; and we have compared predicted unwrap stages to experimental results.

For the 601 sequence the proposed ‘spring-loaded latch’ mechanism could be recovered, as well as the bias in unwrapping, but several inter-mediate stages in unwrapping seem to be missing, especially at the start of unwrapping. For the 5S gene it is not clear whether we recover the bias in unwrapping or the unwrapping stages found by Mauney et al.. The spring-loaded latch mechanism is not captured for the 5S gene by our simulations, but it is arguable whether the ideas of the proposed rigid and flexible sections are well founded. It is unclear how local the base pair

steps influence the flexibility of the DNA molecule and thus whether the

proposed rigid sections are actually rigid. Studies by de Bruin et al. for example look at the eigenvalues of eigenmodes of the stiffness matrix for repeating sequence sub-units, like the repeating AT sequence or the A-tract sequence [20], which they admit only gives partial information about the flexilibity or rigidness of these studied sections. Further research into

flexible and rigid sections of theDNAsequence is required to explore this

further.

We assume that the theoretical adsorption energy used to simulate the binding to the histone core is distributed uniformly across the nucleosome.

(44)

It has been argued however that the adsorption energy per binding site increases towards the dyad [21]. A non-uniform adsorption energy dis-tribution according to these findings has been explored in this study, but yields deviations from the uniform distribution that are deemed too ex-treme. The most striking result of using this distribution is a large decrease in accessibility towards the dyad and a larger fraction of mostly wrapped structures for all values of adsorption energy than previously explored for the uniform distribution, while the fraction of mostly unwrapped struc-tures was almost absent.

Another limitation of our model is that theDNAbase pairs are assumed

to be rigid plates that cannot twist. Also the model only takes nearest neighbour interactions into account. Making this ‘twist move’ available to the base pair steps in simulations could increase the predicted fraction of mostly unwrapped structures, and maybe counter the extreme wrapping affinity when using a non-uniform adsorption energy distribution.

Also the proposed binding sites of the nucleosome could be further scrutinized. In the model these sites are completely fixed in position and orientation and can only bind to certain base pair steps, while it could be that the histone core can afford some deviation from its ideal structure de-rived from crystal structures and make bindings to other base pair steps.

Also only interactions between the phosphate groups of the DNA

back-bone and the histone core are taken into account, while for example water mediated hydrogen-bonds to the oxygen atoms of the phosphate group

are not. It is possible that more flexible parts of the DNA sequence could

move closer to the histone core and could cause more hydrogen-bonds, which could increase the supposed adsorption energy of that binding site [22].

In experiments [13] the unwrapping of nucleosomal DNAis increased

by introducing counter-ions by increasing the salt concentration of

pre-pared samples. The electron density of theDNAsequence and the histone

core in the samples is measured by small array x-ray scattering SAXS. To

reduce the scattering from the histone core a high concentration of 50%

sucrose is added, so the electron density of the DNA sequence could be

properly measured. These added ions and sugars could influence the un-wrapping in ways we do not take into account in our model.

Further details of the simulation and results for all analyzed sequences can be found in the Appendix.

(45)

[1] H. Schiessel, Biophysics for beginners: a journey through the cell nucleus, Pan Stanford Publ, OCLC: 872055740.

[2] K. Luger, A. W. M¨ader, R. K. Richmond, D. F. Sargent, and T. J.

Rich-mond, Crystal structure of the nucleosome core particle at 2.8 ˚A resolution,

Nature 389, 251.

[3] T. M. Ngo, Q. Zhang, R. Zhou, J. Yodh, and T. Ha, Asymmetric Unwrap-ping of Nucleosomes under Tension Directed by DNA Local Flexibility, Cell

160, 1135.

[4] H. S. Tims, K. Gurunathan, M. Levitus, and J. Widom, Dynamics of Nucleosome Invasion by DNA Binding Proteins, Journal of Molecular Biology 411, 430.

[5] G. Li, M. Levitus, C. Bustamante, and J. Widom, Rapid spontaneous ac-cessibility of nucleosomal DNA, Nature Structural & Molecular Biology

12, 46.

[6] W. J. A. Koopmans, R. Buning, T. Schmidt, and J. van Noort, spFRET Using Alternating Excitation and FCS Reveals Progressive DNA Unwrap-ping in Nucleosomes, Biophysical Journal 97, 195.

[7] T. T. M. Ngo, J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha, Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability, Nature Communications 7, 10813.

[8] B. D. Brower-Toland, C. L. Smith, R. C. Yeh, J. T. Lis, C. L. Peter-son, and M. D. Wang, Mechanical disruption of individual nucleosomes reveals a reversible multistage release of DNA, Proceedings of the Na-tional Academy of Sciences 99, 1960.

(46)

[9] L. de Bruin, M. Tompitak, B. Eslami-Mossallam, and H. Schiessel, Why Do Nucleosomes Unwrap Asymmetrically?, The Journal of Phys-ical Chemistry B 120, 5855.

[10] B. Eslami-Mossallam, R. D. Schram, M. Tompitak, J. v. Noort, and H. Schiessel, Multiplexing Genetic and Nucleosome Positioning Codes: A Computational Approach, PLOS ONE 11, e0156905.

[11] M. Tompitak, L. de Bruin, B. Eslami-Mossallam, and H. Schiessel, De-signing nucleosomal force sensors, Physical Review E 95, 052402.

[12] J. Culkin, L. de Bruin, M. Tompitak, R. Phillips, and H. Schiessel, The role of DNA sequence in nucleosome breathing, The European Physical Journal E 40, 106.

[13] A. W. Mauney, J. M. Tokuda, L. M. Gloss, O. Gonzalez, and L. Pollack, Local DNA Sequence Controls Asymmetry of DNA Unwrapping from Nu-cleosome Core Particles, Biophysical Journal 115, 773.

[14] Y. Chen, J. M. Tokuda, T. Topping, J. L. Sutton, S. P. Meisburger, S. A. Pabit, L. M. Gloss, and L. Pollack, Revealing transient structures of nu-cleosomes as DNA unwinds, Nucleic Acids Research 42, 8767.

[15] Y. Chen, J. M. Tokuda, T. Topping, S. P. Meisburger, S. A. Pabit, L. M. Gloss, and L. Pollack, Asymmetric unwrapping of nucleosomal DNA propagates asymmetric opening and dissociation of the histone core, Pro-ceedings of the National Academy of Sciences of the United States of America 114, 334.

[16] W. K. Olson, A. A. Gorin, X.-J. Lu, L. M. Hock, and V. B. Zhurkin, DNA sequence-dependent deformability deduced from protein–DNA crys-tal complexes, Proceedings of the National Academy of Sciences 95, 11163.

[17] F. Lanka˜s, J. r. ˜Sponer, J. Langowski, and T. E. Cheatham, DNA Base-pair Step Deformability Inferred from Molecular Dynamics Simulations, Biophysical Journal 85, 2872.

[18] P. Prinsen and H. Schiessel, Nucleosome stability and accessibility of its DNA to proteins, Biochimie 92, 1722.

(47)

[20] L. De Bruin and J. H. Maddocks, cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA, Nucleic Acids Research 46, W5.

[21] A. Fathizadeh, A. Berdy Besya, M. Reza Ejtehadi, and H. Schiessel, Rigid-body molecular dynamics of DNA inside a nucleosome, The Euro-pean Physical Journal E 36.

[22] C. A. Davey, D. F. Sargent, K. Luger, A. W. Maeder, and T. J. Rich-mond, Solvent Mediated Interactions in the Structure of the Nucleosome

Core Particle at 1.9 ˚A Resolution., Journal of Molecular Biology 319,

(48)

(49)

(50)