nucleosomal
DNA
THESIS
submitted in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE
in
PHYSICS
Author : K. van Deelen
Student ID : s1282220
Supervisor : prof. dr. H. Schiessel
2ndcorrector : prof. dr. L. Giomi
nucleosomal
DNA
K. van Deelen
Huygens-Kamerlingh Onnes Laboratory, Leiden University P.O. Box 9500, 2300 RA Leiden, The Netherlands
April 23, 2020
Abstract
This study uses the rigid base pair model (rbp) and Markov Chain Monte
Carlo (MCMCs) to simulate the unwrapping of nucleosome core particles
(NCPs). The model is sequence dependent and is used to research the bias
in left or right unwrapping and the effect of weakening the nucleosome
bindings for severalDNAsequences (Widom-601, sea urchin 5S gene and
601-derivatives). We are able to focus on intermediate stages in unwrapping, while these may not always be visible in experiments. We
validate the model by comparing model outcomes to experimental results and we propose a (simple) method to find interesting sequences
1 Introduction 7
2 Methods 9
2.1 Models 9
2.2 Metropolis Algorithm 10
2.3 Unwrapping theDNAmolecule 12
3 Practicalities and formulae 15
3.1 Measurements and correction 15
3.2 Binding to a nucleosome: adsorption energy 15
3.3 Boltzmann weights of unwrapping states 16
3.4 Access probability 17
3.5 Probability of unwrapping 18
4 Results and Interpretation 19
4.1 Elastic energy of a DNA molecule 19
4.2 Total energy of an unwrapping state 21
4.3 Total energy and relative probability of all unwrapping states 26
4.4 Unwrap distribution 31 4.5 Access probability 33 4.6 Probability of unwrapping 36 5 Conclusion 43 Appendices 49 A Sequence information 51
A.1 Sequences used 51
B Unwrapping states of all studied sequences 57
B.1 Total energy and relative probability 57
B.2 Cumulative probability and relative asymmetry in
unwrap-ping 71 C Simulation details 73 C.1 Equilibration 73 C.2 Correlation 75 C.3 Simulated Annealing 75 C.4 Error Analysis 76
Chapter
1
Introduction
DNA can be a very long chain of hundreds of millions of nucleotides and
has to be wrapped up very neatly to fit inside a nucleus, as it is usually in the order of decimeters and has to fit in a sphere in the order of
mi-crometers. To achieve this highly packed form, DNA is wrapped in
sev-eral hierarchical structures [1]. At the lowest level of compactification the
DNAis part of fundamental units called nucleosome core particles (NCPs),
which consist of around 147 base pairs of double strandedDNA wrapped
in about 13/4turns in a superhelical configuration around a protein core.
This configuration requires the DNA to bend quite a lot and may induce
deformations from its ideal double helical structure [2]. TheDNAcontains
a sequence of base pairs, which in order influences the mechanical proper-ties of the double helix, such that very rigid or very flexible sections may
occur. Very rigid sections ofDNAwill very unlikely wrap around a histone
core, as opposed to more flexible sections that do [3]. The induced bend-ing also results in a certain sequence affinity for nucleosome positionbend-ing [4].
DNA is known to spontaneously unwrap and rewrap small sections
— possibly for gene expression [4, 5] — due to thermal fluctuations, but sometimes a large section needs to unwrap in order for some sites in the
buried nucleosomal DNA to become accessible. This unwrapping could
happen symmetrically, from both sides with roughly equal probability [6], but some sequences are known to unwrap very asymmetrically, such as the widely studied artificial Widom-601 sequence, and the natural occur-ring sea urchin 5S gene [3, 7, 8], which could have implications for di-rectionality of transcription. However, in order to unwrap large sections
of DNAit requires the bindings between the histone core and the DNA to
un-derstanding of DNA binding to the protein core: how can it be that gene
expression occurs whileDNA spends most it time being mostly wrapped
and how can it unwrap in large sections; how does this depend on the
DNAsequence and how is it influenced by the weakening of the bindings
between theDNAand the histone core?
We suggest this process is heavily influenced by the mechanical infor-mation of the sequence — as it influences the affinity to wrap around a histone core, and it may determine the bias in unwrapping — and by the concentration of counter-ions in the environment. Previous work using the rigid base pair model (rbp) in a nucleosome model has recovered ba-sic nucleosome positioning rules [10]; explored nucleosome unwrapping through induced force [9, 11]; and sequence dependency of nucleosome breathing by looking at the accessibility for restriction enzymes at certain
sites in the nucleosomal DNA [12]. Previous use of the model already
explored some aspects of nucleosome unwrapping, but did not take the
binding strength of theDNAto the histone core into account.
Much experimental work has been done to understand how NCPs
un-wrap, but we will mainly focus on the results by Mauney et al., where
several stages in the unwrapping of DNA sequences Widom-601 and the
sea-urchin 5S gene are found [13]. Like in previous experimental work [14, 15], the nucleosome binding strength is weakened by adding a salt
concentration, introducing billions of Na+-ions. The increase in salt
con-centration gives rise to an increased number of structures that are highly unwrapped. We will map this salt concentration to the theoretical
adsorp-tion energy required for theDNAto bind to the histone core. In this report
we will show at what level our model is able to reproduce experimental re-sults and develop several analyses that can suggest interesting sequences for further experimental research.
Chapter
2
Methods
2.1
Models
To simulate a (partially) wrappedDNA molecule we use a rigid base pair
model (rbp). We assume theDNAmolecule behaves like a polymer chain,
consisting of linked rigid plates which represent the base pairs. Each base pair is connected to its previous and next neighbours in the chain and has six degrees of freedom: it can translate and rotate in three dimensions. The actual nucleosome is not explicitly constructed, but implicated by forcing the polymer chain in a superhelical configuration. In our nucleosome a set of 28 constraints is enforced, corresponding to the binding sites where
theDNAmolecule is bound to the nucleosome, where the minor groove of
the double helix faces inwards. At each of the fourteen binding sites there are two constraints (see Fig. 2.1). Each constraint fixes the location and orientation of the mid-frame between consecutive base pairs, called a base
pair step — corresponding to the phosphate groups of theDNAbackbone
— which has to stay in the same location and orientation in this model. Even with these constraints, there will be an enormous amount of pos-sible configurations of the molecule, but not all are as likely to occur. For
DNA to be wrapped in a nucleosome structure, it has to be sharply bent
and twisted, and this induces deviations from the ideal super helical struc-ture. In our model we assume these deviations induce a quadratic defor-mation energy. The model takes only nearest-neighbour interactions into account, in which case the deformation energy for a base pair step (two consecutive base pairs) is given as:
E= 1
Figure 2.1:Visualization of the rigid polymer chain ’wrapped’ around the histone core. The green plates are the base pairs of the DNA sequence (rbp model). The sequence is bound to the histone core at the binding sites, represented as dots (nucleosome model). Source: [12]
where q is a 6-dimensional vector containing the spatial and rotational
de-grees of freedom of a base pair step, q0 its intrinsic, preferred values, and
K is the (6 by 6) ‘stiffness’ matrix, coupling the interaction between two
base pairs. q0 and K are different for each base pair step, resulting in the
sequence dependency of the model. Each base pair step has its own
in-trinsic values q0and stiffness parameters K, which are fully parametrized
in the literature [16, 17]. In order to get likely configurations of the DNA
molecule, we use the Metropolis Algorithm to generate a probability den-sity function of possible configurations, from which we then sample.
2.2
Metropolis Algorithm
To generate random configurations of a wrapped DNA molecule we use
the Metropolis algorithm. The Metropolis algorithm is a Markov Chain Monte Carlo method to generate a sequence of random samples from an unknown or difficult to calculate probability distribution. The initial state is the configuration where the molecule is forced into a superhelical
struc-whether this base pair is part of a binding site or not. Unbound base pairs can perform every move (change location and orientation), but bound base pairs can only make moves that ensure that their mid-frame keeps the same location and orientation, as the mid-frame corresponds to the
phos-phate group that is bound to theDNAbackbone.
After every base pair step has had a chance to make a move , this will have constructed a new configuration of the molecule, for which again the deformation energy can be calculated. This new configuration has a chance to be accepted, forming the next state in the Markov Chain, or rejected, and the current configuration is again used to create the next. Whether the new configuration is accepted or rejected depends on the new deformation energy. If the energy of the new sample is higher than the previous one, it is most likely rejected, as it would be unlikely that the system will propagate to that state. If the energy is lower the new sample is always accepted. The acceptance rate α depends on the difference in
energy∆E and the sampling factor β=1/kBTr, where kBis the Boltzmann
constant, and Tr room temperature:
α =
(1 if∆E<0
e−β∆E if∆E>0 (2.2)
where∆E=Enew−Eprevious.
If we do this with a large enough amount of steps, the Markov Chain will eventually converge to a probability distribution of configurations of
theDNA molecule. We can now sample from this distribution
(‘measure-ment’ step), and do this multiple times to make sure we get sufficiently accurate estimation of the energy. The sequential samples in the Markov Chain are highly correlated, as each next sample depends on the previous one, but we can simply generate additional samples between ‘measure-ment’ steps to lower the correlation.
If we sample at a high temperature T =1= Trwe get highly
fluctuat-ing measurements, with a high standard deviation. To decrease the stan-dard deviation of the measured energies, we use a high sampling factor
β = 1000. Simulated annealing is used to ensure the Markov Chain will
converge around the state with a global minimum in deformation energy. This means the system will slowly get to equilibrium with a large number
of steps, each time decreasing the sampling temperature T = 1/β,
start-ing at T = 1 (room temperature) until T = 1/1000. For more details on
binding site first base pair step second base pair step 1 2, 3 6, 7 2 14, 15 17, 18 3 24, 25 29, 30 4 34, 35 38, 39 5 45, 46 49, 50 6 55, 56 59, 60 7 65, 66 69, 70 8 76, 77 80, 81 9 86, 87 90, 91 10 96, 97 100, 101 11 107, 108 111, 112 12 116, 117 121, 122 13 128, 129 131, 132 14 139, 140 143, 144
Table 2.1:Base pair indices for the first and second base pair step of each binding site. The total length of the sequence is 147 base pairs with indices from 0,...,146.
2.3
Unwrapping the
DNA
molecule
The DNA molecule is ‘wrapped’ due to 28 enforced constraints,
corre-sponding to the 14 binding sites where the DNA is bound to the histone
core. These constraints are imposed on the base pairs where the histone
side chains would be bound to the phosphate groups of the DNA
back-bone. These phosphate groups are between consecutive base pairs, which we will call a base pair step. A binding site consists of two base pair steps and their locations are derived from crystal structures, and reproduced by the nucleosome model [10], see Table 2.1. We will call base pair steps be-longing to a binding site ’bound’, and those not bebe-longing to a binding site ’unbound’.
Figure 2.2:Visualization of unwrapping a sequence by opening binding sites. At a) four binding sites from the left are open (i=4) and one from the right (j =1). This means that sites 1 ≤ k ≤ 4 and k = 14 are accessible, and e.g. k = 7 is not. At b) eight binding sites are open from the left, meaning that site k = 7 now is accessible. Source: [12]
To unwrap a DNA molecule the constraints of a binding site are
re-voked, allowing more degrees of freedom for the previously bound base pairs, and leading to the relaxation of the then unwrapped section of the
DNA molecule (see Figure 2.2). An unbound base pair step has 6 degrees
of freedom for each base pair (3 translational and 3 rotational); a bound base pair step has a fixed mid-frame and thus less degrees of freedom: 3 for each base pair. We will consider ‘unwrapping states’ where we open a certain amount of binding sites from the left end (5’-end) and from the
right end (3’-end). We assume that opening a binding site 1 ≤k ≤ 14 can
only occur if all binding sites between k and the left end or right end are also open. We call the deformation energy — including the energy freed
by opening the binding sites (see Section 3.2) — of such a state Eij, with i
Chapter
3
Practicalities and formulae
3.1
Measurements and correction
Our model gives us the free energy of the configuration of theDNAmolecule.
To get the elastic energy of unwrapping state(i, j), we need to correct for
entropy, using the equipartition theorem:
Eelastic(i, j) =Emodel(i, j) −Eentropy =Emodel(i, j) −Eequipartition(i, j) (3.1)
Eequipartition(i, j) =
1
2β ·q =
1
2β ·6· (147− (28−2(i+j))) (3.2)
with β = 1/kBT the sampling factor and q the total degrees of freedom,
which depend on the total number of ‘free’ base pairs. For a completely
wrapped state (i = j =0) we have 147−28 free base pairs, as we have 14
binding sites that each fixes 2 base pair steps, consisting of 2 base pairs, in place and orientation. In Section 2.3 we have seen that each base pair step bound the degrees of freedom for each base pair is reduced by 3. Thus for
opening(i+j) binding sites we gain 3·2·2· (i+j) =6·2(i+j)degrees
of freedom.
3.2
Binding to a nucleosome: adsorption energy
To get the total energy of an unwrapping state, we have to simulate the
effect of theDNA molecule bound in a nucleosome. In the rbp model this
is done by enforcing certain constraints, but we also need to take the en-ergy required to open these bindings into account. We do this by adding
it to the elastic energy acquired from the rbp model. We can say that re-leasing a binding site reduces the total energy of the system equal to the adsorption energy that was required to bind the base pairs to the histone core. We make the assumption that this adsorption energy is equal for ev-ery binding site: that the adsorption energy distribution is uniform. Then
for an unwrapping state(i, j)the total energy is given by:
Etot(i, j) = Eelastic− (14−i−j) ·Eads (3.3)
where Eads is the adsorption energy (in units kTr). This total energy
de-creases when we open (i+j) binding sites. We will call the total energy
for an unwrapping state, Etot(i, j), from now on Eij.
The adsorption energy plays a big role in how a sequence unwraps from the nucleosome, as the total energy of an unwrapping state is given by the difference between the energy gained by becoming more straight (and less bent and strained), and the energy it costs to open a binding site. It is theorized that the total adsorption energy should be slightly higher
than the total elastic energy of theDNAsequence that is required to bend
it in a super helical configuration [18]. Our simulation predicts that the
bending energy for a fully wrapped configuration to be around 65 kTr
-which is near predicted values [18], so we expect the total adsorption
en-ergy to be the same, but slightly exceeding it — by 1 to 2 kTr per binding
site — as it should be possible for the sequence to spontaneously unwrap and rewrap so it may become accessible temporarily. This gives us a total
adsorption energy in the range 79 – 93 kTr. If we assume that the binding
energy distribution is uniform, we get per binding site Eads ≈5.5 – 6.5 kTr.
By lowering the adsorption energy we can simulate the effect of weak-ening the nucleosome binding strength. In experiments, this is done for example by increasing the salt concentration around the nucleosome to
decrease the binding strength between the phosphate group of the DNA
and the core [13, 14, 19]. We will take Eads = 6.5 kTr as a maximum value
for the adsorption energy per binding site, and expect theDNAmolecule to
be fully/mostly wrapped. We expect for Eads <5.5 kTr theDNA molecule
to be partially unwrapped, and for Eads < 4.5 kTr mostly unwrapped, as
(theoretically) the adsorption energy does not exceed the elastic energy at that point.
probability we use Boltzmann weights:
Cij =e−βEij =e−Eij (3.4)
The last step β =1 is taken as in our case we measure the deformation
energy of the configuration of the DNA molecule at equilibrium, at room
temperature Tr. In our simulation β(T) = 1/Tis given in terms of the
unit-lessT/Tr, so at room temperature we have β(Tr) = 1.
The partition function for this system is then given by:
Z=
i+j<14
∑
i,j≥0
Cij (3.5)
as we have the upper bound of one binding site remaining closed (i+j =
13 < 14), as otherwise we just have free DNA. With this, we can look at
what the relative probability is that a sequence unwraps a certain amount of base pairs, corresponding to the amount of open binding sites. For this we will first use the so-called access probability and then we will look at the relative probability of unwrapping.
3.4
Access probability
Access probability P(k) is the chance that binding site k ∈ {1, ..., 14} is
accessible (to for example an enzyme). In our case we assume this is the case when binding site k is open, as we are only interested in how the sequence unwraps from the nucleosome, so we assume that there is no steric accessibility required (additional binding sites that should be open) [12]. Site k is accessible when enough binding sites are open; this could happen from the left or from the right (or both), such that:
P(k) = 1 Z i+j<14
∑
i≥k Cij+ i+j<14∑
j>14−k Cij ! (3.6)as site k is only accessible when from either side at least all sites up to site
k are open. This P(k) gives us a nice accessibility ‘profile’, that can tell
use something about the general ease of unwrapping and whether there is a bias in unwrapping: if it unwraps more easily from the left (5’-end) or from the right (3’-end).
3.5
Probability of unwrapping
Each Boltzmann weight corresponds to a relative probability that
unwrap-ping state(i, j)occurs. Using these weights we can calculate the
probabil-ity of n open binding sites:
Punwrap(n) = 1 Z 0≤i+j≤14
∑
i+j=n Cij ! (3.7)Note that here we allow the fully unwrapped state i+j = 14. We can
also calculate whether this unwrapping occurs relatively more from the left than from the right:
RAunwrap(n) = 1 Z 0≤i+j≤14
∑
i>j (+Cij) + 0≤i+j≤14∑
i<j (−Cij) ! (3.8) which we will call the relative asymmetry in unwrapping; it being pos-itive indicates a left bias in unwrapping, and when negative a right bias in unwrapping. We can relate the amount of open binding sites n to theChapter
4
Results and Interpretation
First we will focus on theDNAmolecule when it is fully wrapped (all
bind-ing sites are closed), and then analyze the total energy of unwrappbind-ing state
(i, j). As a start we will look at what happens if we ‘unwrap’ the DNA in
stages from either the left or the right and the influence of the adsorption
energy Eads, after which we will look at all unwrapping states, the total
en-ergy ‘landscape’ of aDNA sequence. Then, using the Boltzmann weights
Cij, we can look at how the molecule will most likely unwrap and we see
whether there is a clear bias.
4.1
Elastic energy of a DNA molecule
We begin by analyzing the elastic energy (in units kTr) of a completely
wrapped nucleosome — in other words, when all binding sites are closed.
Here we only look at the elastic energy of theDNAmolecule, so no
adsorp-tion energy of the binding sites has been taken into account. The elastic energy alone can tell us where most of the elastic energy is stored, which
is directly correlated to where theDNAmolecule is most forced to deviate
from its ideal helical configuration.
Figure 4.1 shows the elastic energy stored between base pairs for three sequences: the theoretical fabricated uniform sequence ‘X’, the Widom-601 sequence and the sea urchin 5S gene. The uniform X sequence consists of base pairs of X nucleotides. The X base pair step has ‘stiffness’ parameters that are the arithmetic average of all the (natural) base pair steps. In the figure the position of the binding sites is also shown: black and red dashed lines indicate the base pair indices per binding sites (upper axis).
2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Ela sti c e ne rg y ( kTr ) sequence X 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14
(a)uniform X sequence
2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Ela sti c e ne rg y ( kTr ) sequence 601 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14
(b)601 sequence
2 7 14 18 24 30 34 39 45 50 55 60 65 70 76 81 86 91 96 101 107 112116 122 128132 139 144 base pair index
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Ela sti c e ne rg y ( kTr ) sequence 5S 1 2 3 4 5 6 binding site index7 8 9 10 11 12 13 14
(c)5S sequence
Figure 4.1:Elastic energy (kTr) per base pair step of a completely wrapped
nucle-osome for the uniform X, 601 and 5S sequence. The dyad (middle of the sequence) is indicated by a grey dashed line at base pair index 73. The black and red dashed lines are located at the base pair indices of each binding site (see binding site index in the upper axis).
The elastic energy profile for the uniform X sequence is very symmet-rical, which is easily seen when looking at the location and height of the peaks in Figure 4.1a, which is what we might expect for a completely uni-form and self-palindromic sequence. The elastic energy profile for the 601 sequence is however asymmetrical: there are more high peaks in elastic energy to the right of the dyad than to the left, with the highest peak be-ing at bindbe-ing site index 11. This means that most of the elastic energy is stored to the right of the dyad. From this alone we might expect that the 601 sequence will unwrap more from the right, as there is more
en-ergy stored in the DNAat binding sites to the right of the dyad, and thus
more energy to be gained when opening those binding sites. For the 5S se-quence the elastic energy per binding site also is asymmetric with respect to the dyad, but the comparable differences per binding site are small (see
k =4 and k =11; k=5 and k =10; k =6 and k=9). We will later see if it
is possible to get a better indication for asymmetric unwrapping (see
Fig-ure 4.7). Next we will look at the total energy of theDNAmolecule — we
now include the theoretical adsorption energy per binding site required
for theDNAsequence to be bound to the histone core — for several stages
in unwrapping. We call this energy the total energy of an unwrapping state.
4.2
Total energy of an unwrapping state
The total energy of an unwrapping state is the (corrected) deformation en-ergy from the simulation, subtracted by the adsorption enen-ergy of any open binding sites (see Equation (3.3)). We can first look at how the total energy
changes when unwrapping only from the left (j = 0) or the right (i = 0),
for the uniform X sequence and the 601 sequence. This is shown in terms of the cumulative energy cost: how much energy is required to open more and more binding sites, or in other words, unwrap further and further. We expect the energy cost of opening a binding site to be uniform for the uniform X sequence, and thus the cumulative energy cost of unwrapping to steadily change with the amount of open binding sites; and we expect there to be no difference in unwrapping from the left or from the right. We compare this with the 601 sequence, which we do expect to have an asymmetry in unwrapping.
Figure 4.2 shows the cumulative energy cost of opening n binding sites from only the left or the right. We can already see that generally the cumu-lative energy cost increases when more binding sites are opened. The cost for opening the first and the last binding site is especially high: around
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
0 5 10 15 20 25 30 Cu m ula tiv e e ne rg y c os t ( kTr ) Unwrapping from left right
(a)uniform X sequence
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0 5 10 15 20 25 30 Cu m ula tiv e e ne rg y c os t ( kTr ) Unwrapping from left right (b)601 sequence
Figure 4.2: Cumulative energy cost of unwrapping the uniform X sequence and 601 sequence from only the left or right for Eads= 6.5 kTr.
5 kTr. This may be explained by theDNAalready being mostly straight at
the ends, which means that the elastic energy gained by relaxing is very small, and thus the energy cost for opening the binding site is high. For the uniform X sequence we can see no difference between unwrapping between the left and the right and the cumulative energy cost increases rather steadily, but not completely linearly as we might expect. This could
be due to theDNAmolecule not forming a perfect superhelix when bound,
leading to places in the molecule with weaker (and stronger) curvature, leading to steeper (and less steep) increase in the cumulative energy cost.
For the 601 sequence unwrapping from the left and the right is very
similar at the start (n<4) and at the end (n >10), but there is a large
dif-ference near the center of the sequence (4≤n≤10). We can see that if we
unwrap from the right the cumulative energy cost decreases after 3 binding
sites are opened (n = 3), which suggests that after opening binding site
k = 3 it is very likely that up binding site k = 6 will also be open, as it
costs the same amount of energy. In general however, the sequence will most likely stay wrapped, since the resulting energy change from opening binding sites is overall highly positive. This changes when we lower the adsorption energy, as we can see in Figure 4.3.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
0 2 4 6 8 10 12 14 16 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 6.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 6.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 4 3 2 1 0 1 2 3 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 5.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 2 0 2 4 6 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 5.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
15.0 12.5 10.0 7.5 5.0 2.5 0.0 2.5 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 4.0 kTr Unwrapping from left right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
12 10 8 6 4 2 0 2 Cu m ula tiv e e ne rg y c os t ( kTr ) Eads = 4.0 kTr Unwrapping from left right
Figure 4.3: Cumulative energy cost of unwrapping sequences X (left) and 601 (right) from only the left or right end when lowering the adsorption energy to Eads= 6.0 kTr(upper), 5.0 kTr(middle), 4.0 kTr(lower).
In Figure 4.3 states in unwrapping from only the left or right for de-creasing values of adsorption energy are shown. We can see that the
ad-sorption energy Eadsheavily influences the cumulative energy cost of
un-wrapping: we can see that for high adsorption energy (Eads > 5 kTr) the
cost generally increases with the amount of open binding sites, while for
low adsorption energy (Eads < 5 kTr) the cost decreases. Note that for all
values of Eadsthe difference in unwrapping from the left and from the right
for the 601 sequence is present, and for the uniform X sequence is absent.
We can also see that for Eads < 5 kTr the cost is generally negative, which
indicates that the DNA molecule gains energy by unwrapping, so at this
point we will most likely find fully unwrapped states. This information is condensed in Figure 4.4.
Figure 4.4 shows the cumulative energy cost for the uniform X
se-quence and the 601 sese-quence for several values of Eads, indicated by the
number at the end of each line plot. Whether the sequence favors to un-wrap from the left or from the right depends on where the cumulative energy cost of the two sides intersect, and is indicated by a blue, orange or
black coloured intersection cross ‘×’. Blue indicates a right bias, orange a
left bias, and black indicates a lack of bias. The sequence favors to unwrap from the side where the cumulative energy cost is lower, thus the bias is to-wards the side for which the cost decreases when unwrapping even more (after the intersection). In short: when following the lines towards the middle, whichever has a lower energy indicates the bias. For the uniform sequence X we see a highly symmetrical cumulative energy cost in unwrapping, and
see several black crosses, indicating no bias. For Eads =5.5 kTrwe can see a
couple of blue and orange crosses that are very close together, so we inter-pret this as no bias. For the 601 sequence we see several blue intersection crosses, indicating that the 601 sequence has a right bias in unwrapping. This method can show us at first glance if the sequence has a bias in un-wrapping, but this does not tell us what happens at intermediate stages of unwrapping, or when both ends may unwrap. To get a complete picture, we visualize the unwrapping states in the next section (Fig. 4.5).
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n from the left
15 10 5 0 5 10 15 20 Cu m ula tiv e e ne rg y c os t ( kTr ) 6.5 6.5 6.0 6.0 5.5 5.5 5.0 5.0 4.5 4.5 4.0 4.0 left right
14 13 12 11 amount of open binding sites n from the right10 9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n from the left
10 5 0 5 10 15 20 25 Cu m ula tiv e e ne rg y c os t ( kTr ) 6.5 6.5 6.0 6.0 5.5 5.5 5.0 5.0 4.5 4.5 4.0 4.0 left right
14 13 12 11 amount of open binding sites n from the right10 9 8 7 6 5 4 3 2 1 0
Figure 4.4: Cumulative energy cost of unwrapping the uniform X sequence (up-per) and the 601 sequence (lower) from only the left or right for several Eads. The
number at each line indicates the value of the adsorption energy used (in kTr).
The middle of the sequence is indicated by a dashed grey line at n=7. Unwrap-ping from the right is mirrored and follows the upper x-axis (blue). Intersections are noted by a ‘×’, coloured orange for a left bias, blue for a right bias or black for no bias.
4.3
Total energy and relative probability of all
unwrapping states
We will now look at all unwrapping states of the nucleosome. The total energy for every unwrapping state of the 601 sequence is given in Figure 4.5. When we lower the adsorption energy the total energy increases over-all, as less energy is stored in the closed binding sites, and it costs less energy to open them. However, the energy increase happens
asymmetri-cally with respect to kleft and kright and we can see a clear bias: the total
energy mainly increases for states where kleft > kright. But this
asymme-try is more pronounced when we look at the relative probability of these
unwrapping states, using the Boltzmann weights Cij.
We normalize the Boltzmann weights by the partition function Z to get
a relative probabilityCij/Z that a DNAmolecule will be found in a certain
unwrapping state (see Figure 4.6). When we lower the adsorption energy the relative probability to be in that unwrapping state increases, as the energy of that state decreases. This does not happen symmetrically with
respect to kleftand krightand we again see the same bias: the relative
prob-ability mainly increases for kright > kleft (below grey line), indicating a
preference in unwrapping from the right. Furthermore, we can see when
lowering the adsorption energy especially C0,5 sharply increases,
indicat-ing that all at once, about 5 bindindicat-ing sites will open from the right. When we lower the adsorption energy even further the relative probability
in-creases for all states (i,j), but mainly for unwrapping states where kright ≥5
and kright > kleft. The Appendix contains the total energy and relative
Boltzmann weights of all unwrapping states for all studied sequences: the uniform X sequence, 601 sequence, 5S gene and 601 derivatives 601MF, 601RTA, 601L in Figures B.1 to B.12.
0 1 2 3 4 5 6 7 8 9 1011121314
k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 6.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 6.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 5.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 5.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 4.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 4.0 kTr25
20
15
10
5
0
5
10
E
to
t
(k
T
r
)
Figure 4.5: Total energy for all unwrapping states of the 601 sequence for sev-eral values of adsorption energy, where kleft indicates the amount of open
bind-ing sites from the left and kright from the right. The gray line indicates where
0 1 2 3 4 5 6 7 8 9 1011121314
k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 6.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 6.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 5.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 5.0 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 4.5 kTr 0 1 2 3 4 5 6 7 8 9 1011121314k
right 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14k
left Eads = 4.0 kTr 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 100C
ij
/
Z
Figure 4.6: Relative probability Cij/Z for all unwrapping states of the 601
se-quence for several values of adsorption energy, where kleftindicates the amount
of open binding sites from the left and krightfrom the right. The gray line is plotted
We can see the effect that the outer ‘arms’ have on unwrapping the
se-quence by looking at the energy difference E−ET or relative Boltzmann
weights C/CT; ET and CT being the transpose of E respectively C. The
transpose means in this case the energy, respectively the relative
probabil-ity of the symmetrically flippedDNAsequences (going from 3’ to 5’ instead
from 5’ to 3’). The energy and probability differences do not depend on the
adsorption energy and give us an idea what the influence of the DNA
se-quence is in unwrapping. In areas where(E−ET)ij ≈0 or(C/CT)ij ≈100
— coloured green in Figure 4.7 — parts of theDNAsequence left and right
to the dyad do not influence the relative probability of unwrapping signif-icantly. This is the case for the uniform X sequence (Figures 4.7a and 4.7b). As expected for the uniform sequence X we see almost no energy dif-ference (check the scale), and consequently no difdif-ference in the probability
differenceC/CT. For the 601 sequence however, we can see several (4!)
or-ders of magnitude difference in the probability difference C/CT between
the areas where kleft is dominant (above grey dashed line), neutral (where
C/CT ≈ 100), and k
right dominant (below grey dashed line). This means
that when opening e.g. 7 binding sites it is ≈ 104 times more likely that
it happens from the right (3’-end) than from the left (5’-end), indicating a
large right bias in unwrapping. Note that k=7, 8 is the binding site in the
centre of the nucleosome, meaning the bias is largest when unwrapping
almost half the DNA. Also note that for very small values kleft , kright ≤ 3
and for very high values kleft, kright≥12 the difference is insignificant. We
will later use this method to compare the effect of the outer arms of other
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 0.06 0.04 0.02 0.00 0.02 0.04 0.06 E E T (k Tr )
(a)Energy difference E−ET
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 101 100 101 C/ C T (b)probability differenceC/CT 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 6 4 2 0 2 4 6 E E T (k Tr ) (c)Energy difference E−ET 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 104 103 102 101 100 101 102 103 104 C/ C T (d)probability differenceC/CT 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 3 2 1 0 1 2 3 E E T (k Tr )
(e)Energy difference E−ET
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kright 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kleft 102 101 100 101 102 C/ C T (f)probability differenceC/CT
Figure 4.7: Effect of the outer arms on unwrapping for sequences X (upper), 601 (middle) and 5S (lower). The grey line indicates where kle f t = kright. Note the
4.4
Unwrap distribution
We can use the relative Boltzmann weights Cij/Z to calculate the relative
probability that, for a given adsorption energy, we find a DNA molecule
with a certain amount of open binding sites n (Equation (3.7)). This is done
by summing all relative Boltzmann weights where n = i+j for every n
in 0–14 (so summing over the diagonals perpendicular to the grey dashed line in Figures B.2 and 4.6). This can tell us what impact the adsorption
energy and theDNAsequence information have on unwrapping.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.0 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 relative probability
Unwrap distribution for Eads = 5.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 0.6 relative probability
Unwrap distribution for Eads = 5.0 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 0.6 relative probability
Unwrap distribution for Eads = 4.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 0.6 relative probability
Unwrap distribution for Eads = 4.0 kTr
left bias no bias right bias
Figure 4.8: Bar plot of the relative probability for n open binding sites for the uniform X sequence for several values of adsorption energy. The colour of the bar indicates the bias in unwrapping. Note that the probability scale changes when lowering Eads.
In Figures 4.8 and 4.9 a bar plot of the relative probability for n open
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 amount of open binding sites n
0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.0 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 relative probability
Unwrap distribution for Eads = 5.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 relative probability
Unwrap distribution for Eads = 5.0 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 relative probability
Unwrap distribution for Eads = 4.5 kTr
left bias no bias right bias
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
amount of open binding sites n 0.0 0.1 0.2 0.3 0.4 0.5 relative probability
Unwrap distribution for Eads = 4.0 kTr
left bias no bias right bias
Figure 4.9:Bar plot of the relative probability for n open binding sites for the 601 sequence for several values of adsorption energy. The colour of the bar indicates the bias in unwrapping. Note that the probability scale changes when lowering Eads.
and the 601 sequence. These bar plots give us a distribution of how likely it is to find a nucleosome with a certain amount of open binding sites n.
We can see that for high Eads(≥6.0 kTr) both sequences are likely
com-pletely wrapped (n = 0), and lowering the adsorption energy generally
increases the probability of more unwrapped states (higher n) occurring. For the uniform sequence X we can see that the left and right bias are very much equal, which is what we expect from a uniform sequence. For
Eads = 5.5 kTr we see that the relative probability for higher n increases
roughly equally for all n (0−0.1), and the relative probability for n =0
de-creases. For Eads <5.5 kTr we see a large decrease of completely wrapped
have seen earlier that this might be the case due to the fact that the
en-ergy gained from unwrapping is very low, as the DNA is already mostly
straight at that point. The unwrap distribution for the 601 sequence shows some similarities, but we see a large right bias in unwrapping, for every
value of Eads, especially at n = 5. When Eads is very low (4.0 kTr) the bias
more or less disappears for highly unwrapped states (n≥10): the left and
right bias are roughly the same, which we might expect, as at this point
almost allDNAis unwrapped and it behaves mostly as freeDNA. One can
see that best in Figure 4.6: for Eads =4.0 kTr the relative probability shows
peaks for being nearly completely unwrapped from either the right or the
left (kle f tor kright=12 or 13).
4.5
Access probability
Figure 4.10 shows the access probability P(k)for the 601 sequence, the
nat-ural occurring sea urchin 5S gene, and the theoretical uniform X sequence, for several values of adsorption energy. The access probability P(k) is the probability that binding site k is open, and can give us some insight on
how theDNAsequence tends to unwrap.
Let us first focus on the top left figure (where Eads =6.5 kTr). Generally,
the access probability decreases towards the middle of the nucleosome
(k = 7, 8), which indicates that binding sites near the ends (k ≤2, k ≥ 13)
tend to be more accessible than those near the dyad, which is logical as more binding sites need to be opened to reach the middle of the
nucle-osome. P(k) is given by the sum of the Boltzmann weights of all
un-wrapping states where binding site k is open, but mainly depends on the
Boltzmann weights of unwrapping states (i, j) where the least amount of
binding sites are open, and thus mainly from one side (k < 7: left side,
k > 8: right side) [12]. This causes values for P(k) at the ends to heavily
influence the access probability towards the middle, which we can
partic-ularly see for the 601 sequence: P(k)stays relatively very large at the right
end, for k = 14 up towards k = 10, and this causes the minimum of P(k)
to be more to the left (k 6= 7, 8, but k = 6), which indicates a right bias
in unwrapping. As we might expect from an uniform sequence, the ac-cess probability for the uniform X sequence is completely symmetrical in
k, and also decreases towards the middle. P(k) for the 5S sequence looks
very similar to the uniform sequence, but we can see a hint of a left bias in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
106 105 104 103 102 101 P(k) Eads= 6.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
104 103 102 101 100 P(k) Eads= 6.0 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
103 102 101 100 P(k) Eads= 5.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
101 100 2 × 101 3 × 101 4 × 101 6 × 101 P(k) Eads= 5.0 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
100 5 × 101 6 × 101 7 × 101 8 × 101 9 × 101 P(k) Eads= 4.5 (kTr) sequence 601 5S X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 open binding site k
100 6 × 101 7 × 101 8 × 101 9 × 101 P(k) Eads= 4.0 (kTr) sequence 601 5S X
Figure 4.10:Probability that binding site k is accessible for the 601, 5S and uniform X sequence for several values of Eads(kTr). Binding site k is accessible when (at
least) site k is open. Note the radical decrease in scale.
In Figure 4.10 we can also see the effect of lowering the adsorption en-ergy on the access probability. Generally the access probability increases
for lower adsorption energy, for every binding site. Around Eads ≤5.0 kTr
it is hard to recognize the original profile, and P(k) tends to be roughly
equal for every k, except for at the ends (k=1, 2, 13, 14), which may be (as
noticed before) due to the ends theDNAalready being rather straight, and
it does not gain a lot of (elastic) energy by opening a binding site, so
un-wrapping is rather costly at that point. At Eads ≤ 5.0 kTr most sequences
To better and more easily see whether a sequence has a bias, we can look at the relative access probability with respect to the (symmetric) uni-form sequence X,
Prel(k) = P(k)/PX(k)
see Figure 4.11. This gives us an indication how symmetric (or asymmet-ric) the access probability of a sequence is, independent of adsorption
en-ergy. We can easily see a right bias for the 601 sequence: Prel(k > 7) > 1
and Prel(k < 7) < 1. Similarly we can see a smaller left bias for the 5S
sequence, but its profile is more symmetric in unwrapping than the 601 sequence, as its curve is more flat.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
open binding site k 10 2 10 1 100 101 102 relative P(k) sequence 601 5S X
Figure 4.11: Relative access probability with respect to the uniform X sequence for the 601 and 5S sequence.
However, the access probability cannot tell us very clearly how (asym-metric) the sequence unwraps in stages. For that we will use the relative probability to open n binding sites and the relative asymmetry.
4.6
Probability of unwrapping
Using Equations (3.7) and (3.8) we can calculate the probability to open n binding sites, which we can correlate to a number of unwrapped base
pairs nτusing Table 2.1. The unwrap distributions are given in Figure 4.12.
Note that when all binding sites are closed and none are open (n=0), there are still some unbound base pairs at the ends of the nucleosome.
4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.5 kTr
left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.2 0.4 0.6 0.8 1.0 relative probability
Unwrap distribution for Eads = 6.0 kTr
left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.1 0.2 0.3 0.4 relative probability
Unwrap distribution for Eads = 5.5 kTr
left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.1 0.2 0.3 0.4 relative probability
Unwrap distribution for Eads = 5.0 kTr
left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 relative probability
Unwrap distribution for Eads = 4.5 kTr
left bias no bias right bias 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp 0.0 0.1 0.2 0.3 0.4 0.5 relative probability
Unwrap distribution for Eads = 4.0 kTr
left bias no bias right bias
Figure 4.12: Bar plot of the relative probability for nτ base pairs unwrapped for the 601 sequence. A line is drawn through the summed relative probability, with the colour corresponding to the adsorption energy (high: blue; low: red).
These bar plots are the same as we have seen before (Figure 4.9), but
now with the amount of unwrapped base pairs nτ. We can condense this
information by representing the relative probability for each value of Eads
with a colored line in a single plot (see Figure 4.13) to compare it to experi-mental data from Mauney et al. [13] where the cumulative fraction and rel-ative asymmetry of unwrapping the 601 and 5S sequences are given (see Figures 4.14 and 4.16). The cumulative fraction is the relative probability
for unwrapping up to nτbase pairs and the relative asymmetry is given by
the difference in probability between left dominant (nτ,le f t > nτ,right) and
right dominant (nτ,le f t < nτ,right) unwrapping. Positive and negative
rel-ative asymmetry indicate a bias in unwrapping from the left end (5’-end) and right end (3’-end) respectively.
4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 relative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 relative asymmetry 6.0 5.81 5.58 5.27 4.82 4.0 Eads (k T) 6.0 5.81 5.58 5.27 4.82 4.0 E ad s (k T)
Figure 4.13:Relative unwrap probability and relative asymmetry for nτbase pairs unwrapped for sequence 601 for several values of adsorption energy.
From Figure 4.13 we can acquire the cumulative sum of the relative probability, from now on called the cumulative probability, and relative asymmetry for the 601 and 5S sequences. We compare the cumulative fraction (experimental data) to the cumulative probability (simulations) and their relative asymmetry in Figures 4.14 and 4.16 for the 601 and 5S sequences.
0 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 1.0 cumulative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 relative asymmetry 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0
(a)(simulation) for several values of adsorption energy Eads(kTr).
(b)(experimental data) for several values of salt concentration (M) .
Figure 4.14:Comparison of the results from our simulations versus experimental results for the 601 sequence. We map the adsorption energy inversely with the salt concentration.
The cumulative fraction / cumulative probability shows the fraction of
DNA samples with a certain amount of unwrapped base pairs. When it
increases sharply, it means that a large amount of the DNA samples will
have that amount of unwrapped base pairs nτ. If the curve is mostly flat
between values of nτ, it means that there are almost none to very few
sam-ples with that amount of unwrapped base pairs. As said before, we map the salt concentration in experiments inversely to the adsorption energy in simulations.
If we look at Fig. 4.14b we see that for a low salt concentration (0.2M,
0.5M) the cumulative fraction increases sharply between 0 ≤ nτ ≤ 20,
which indicates that there are a lot of samples with that amount of
un-wrapped base pairs. After nτ = 20 we see the curve is mostly flat,
in-dicating that there are only a small number of samples with more than 20 bp unwrapped for that salt concentration. Conversely, for high salt
concentration (2M) we see that the curve remains flat for nτ < 120, and
5’ CTGGAGAATCCCGGTGCCGAGGCCGCTCAATTGGTCGTAGACAGCTCTAGCACCGCTTAAACGCACGTACGCG C
3’ TGTCCTACATATATAGACTGTGCACGGACCTCTGATCCCTCATTAGGGGAACCGCCAATTTTGCGCCCCCTGT
Figure 4.15:Template of the Widom-601 sequence [I strand?], with proposed sites of flexibility (highlighted) and supposed rigidness (underlined), starting from the 5’ side towards to the dyad (‘C’) and ending at the 3’ side.
that at low salt concentration most structures will be mostly wrapped, and at high salt concentrations most structures are mostly unwrapped. Impor-tant to note is that the sequences used in the experiments seems be com-plemented with respect to sequences used in simulations, and the biases are reversed. To eliminate this discrepancy, we flip our relative asymme-try, so now in both figures a positive relative asymmetry indicates a right bias in unwrapping, negative a left bias.
If we compare the experimental results with our predictions, generally we see the same behaviour in the cumulative probability in unwrapping: for high adsorption energy most sequences are wrapped; for low adsorp-tion energy most sequences are mostly unwrapped. We see a striking be-haviour in our predictions: the cumulative probability curve is mostly flat
between 4 ≤ nτ ≤ 37 — indicating few structures with that amount of
bp unwrapped — which decreases when lowering the adsorption energy. At high adsorption energy most structures are mostly wrapped: more
than 80% are fully wrapped, and the remainder up to nτ = 47 bp
un-wrapped. This however changes when lowering the adsorption energy:
the fraction with nτ = 47 bp increases sharply. For very low adsorption
energy (Eads < 5 kTr), this fraction decreases again, and most structures
are mostly unwrapped (nτ >78). This may mean that a section of around
40 bp releases in one go more and more often when lowering the adsorp-tion energy. Mauney et al. report on this same behaviour, coining it the ‘spring loaded-latch mechanism’, which operates on a large section of
rel-atively rigidDNAreleasing in one go. They relate the cause of this
mecha-nism to certain relatively flexible base pair steps, and areas where those are mostly absent, resulting in supposedly large rigid sections. When looking at the sequence in Figure 4.15 they find a large rigid section (underlined) between the 3’-end and the dyad, of around 20 bp long, 30 bp from the 3’-end, and show this mechanism is supported by the relative asymmetry,
as they find a large right bias in unwrapping at nτaround 60–70 bp.
Our simulations predict a similar, but larger right bias in unwrapping
at nτ between 58–68 bp. This difference in magnitude could be explained
by the following: they note a gradual unwrapping until from both ends around 20 bp have been released, while we find no such thing, and predict
the right side unwrapping in one go, which increases the magnitude in relative asymmetry as it is focused more locally instead of more spread out. Another similarity is that they find this peak in relative asymmetry to increase when increasing the salt concentration, and we also predict a higher relative asymmetry when lowering the adsorption energy.
Even though we do not predict a gradual unwrapping for low nτ, we
do predict a gradual unwrapping for low adsorption energy Eads < 5kTr
when more than half of theDNAhas been released, which is supported by
their results at high salt concentration (>1M).
Now we will compare the cumulative fraction / cumulative probabil-ity and their relative asymmetry for the 5S sequence in Figure 4.16.
0 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.0 0.2 0.4 0.6 0.8 1.0 cumulative probability 4 16 26 37 47 58 68 78 89 99 109 120 130 141 147 amount of unwrapped bp (n ) 0.10 0.05 0.00 0.05 relative asymmetry 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0 6.0 5.93 5.84 5.76 5.66 5.55 5.43 5.3 5.14 4.95 4.73 4.43 4.0
(a)(simulation) for several values of adsorption energy.
(b)(experimental data) for several values of salt concentration.
Figure 4.16: Comparison of the results from our model versus experimental re-sults for sequence 5S. We map the adsorption energy inversely with the salt con-centration.
5’ CTTCCAGGGATTTATAAGCCGATGACGTCATAACATCCCTGACCCTTTAAATAGCTTAACTTTCATCAAGCAA G
3’ TGGCTCGGGATACGACGAACTGAAGCCACTAGCCTGCTCTTGGCCATATAAGTCGTACCATACCAGCATCCGA
Figure 4.17: Template of the sea urchin 5S gene, with proposed sites of flexibil-ity (highlighted) and supposed rigidness (underlined), starting from the 5’-end towards to the dyad (‘G’) and ending at the 3’-end.
For the 5S sequence we see similar trends: for high adsorption en-ergy / low salt concentration most sequences are wrapped, for low ad-sorption energy / high salt concentration most sequences are (mostly) un-wrapped. The difference is however that most experimental samples are already (partially) unwrapped for high adsorption energy / low salt con-centration, which we do not find in our simulations. Our simulations also indicate a far smoother unwrapping than the experimental data: their se-quences mostly unwrap around 40 bp or at least 120 bp (indicated by the
flat lines between nτ = 40 and nτ = 120. We can see a similar increase
in structures between nτ = 37 and nτ = 47 bp unwrapped and after
nτ = 120, but the curves are not mostly flat between those points,
indi-cating a gradual unwrapping. In the relative asymmetry we only see a
(small) left bias in unwrapping at nτ =47. The experimental data shows
two small left bias peaks at nτ = 25 and nτ = 40, which could coincide
with ours, and a larger right bias peak at nτ =50, which we do not predict.
We do however have the relative asymmetry in the same order of magni-tude. By looking at the sequence in Figure 4.17 we can again see supposed rigid sections, about 35 bp away from the 5’- and 3’-ends, of about 5 and 30 bp long respectively. But this time it is not clear whether this explains
the large peak in asymmetry at nτ =47, as this would be in the middle of
Chapter
5
Conclusion
We have seen that our nucleosome model can recover the bias in
unwrap-ping nucleosomal DNAand how this depends on the sequence. We have
shown that including the theoretical adsorption energy in the model can
produce different stages in unwrapping nucleosomal DNA and that this
effect differs for each sequence so far simulated.
We have seen that at high adsorption energy most structures will be almost entirely wrapped, and by lowering the adsorption energy we get an increase in structures that are mostly unwrapped; and we have compared predicted unwrap stages to experimental results.
For the 601 sequence the proposed ‘spring-loaded latch’ mechanism could be recovered, as well as the bias in unwrapping, but several inter-mediate stages in unwrapping seem to be missing, especially at the start of unwrapping. For the 5S gene it is not clear whether we recover the bias in unwrapping or the unwrapping stages found by Mauney et al.. The spring-loaded latch mechanism is not captured for the 5S gene by our simulations, but it is arguable whether the ideas of the proposed rigid and flexible sections are well founded. It is unclear how local the base pair
steps influence the flexibility of the DNA molecule and thus whether the
proposed rigid sections are actually rigid. Studies by de Bruin et al. for example look at the eigenvalues of eigenmodes of the stiffness matrix for repeating sequence sub-units, like the repeating AT sequence or the A-tract sequence [20], which they admit only gives partial information about the flexilibity or rigidness of these studied sections. Further research into
flexible and rigid sections of theDNAsequence is required to explore this
further.
We assume that the theoretical adsorption energy used to simulate the binding to the histone core is distributed uniformly across the nucleosome.
It has been argued however that the adsorption energy per binding site increases towards the dyad [21]. A non-uniform adsorption energy dis-tribution according to these findings has been explored in this study, but yields deviations from the uniform distribution that are deemed too ex-treme. The most striking result of using this distribution is a large decrease in accessibility towards the dyad and a larger fraction of mostly wrapped structures for all values of adsorption energy than previously explored for the uniform distribution, while the fraction of mostly unwrapped struc-tures was almost absent.
Another limitation of our model is that theDNAbase pairs are assumed
to be rigid plates that cannot twist. Also the model only takes nearest neighbour interactions into account. Making this ‘twist move’ available to the base pair steps in simulations could increase the predicted fraction of mostly unwrapped structures, and maybe counter the extreme wrapping affinity when using a non-uniform adsorption energy distribution.
Also the proposed binding sites of the nucleosome could be further scrutinized. In the model these sites are completely fixed in position and orientation and can only bind to certain base pair steps, while it could be that the histone core can afford some deviation from its ideal structure de-rived from crystal structures and make bindings to other base pair steps.
Also only interactions between the phosphate groups of the DNA
back-bone and the histone core are taken into account, while for example water mediated hydrogen-bonds to the oxygen atoms of the phosphate group
are not. It is possible that more flexible parts of the DNA sequence could
move closer to the histone core and could cause more hydrogen-bonds, which could increase the supposed adsorption energy of that binding site [22].
In experiments [13] the unwrapping of nucleosomal DNAis increased
by introducing counter-ions by increasing the salt concentration of
pre-pared samples. The electron density of theDNAsequence and the histone
core in the samples is measured by small array x-ray scattering SAXS. To
reduce the scattering from the histone core a high concentration of 50%
sucrose is added, so the electron density of the DNA sequence could be
properly measured. These added ions and sugars could influence the un-wrapping in ways we do not take into account in our model.
Further details of the simulation and results for all analyzed sequences can be found in the Appendix.
[1] H. Schiessel, Biophysics for beginners: a journey through the cell nucleus, Pan Stanford Publ, OCLC: 872055740.
[2] K. Luger, A. W. M¨ader, R. K. Richmond, D. F. Sargent, and T. J.
Rich-mond, Crystal structure of the nucleosome core particle at 2.8 ˚A resolution,
Nature 389, 251.
[3] T. M. Ngo, Q. Zhang, R. Zhou, J. Yodh, and T. Ha, Asymmetric Unwrap-ping of Nucleosomes under Tension Directed by DNA Local Flexibility, Cell
160, 1135.
[4] H. S. Tims, K. Gurunathan, M. Levitus, and J. Widom, Dynamics of Nucleosome Invasion by DNA Binding Proteins, Journal of Molecular Biology 411, 430.
[5] G. Li, M. Levitus, C. Bustamante, and J. Widom, Rapid spontaneous ac-cessibility of nucleosomal DNA, Nature Structural & Molecular Biology
12, 46.
[6] W. J. A. Koopmans, R. Buning, T. Schmidt, and J. van Noort, spFRET Using Alternating Excitation and FCS Reveals Progressive DNA Unwrap-ping in Nucleosomes, Biophysical Journal 97, 195.
[7] T. T. M. Ngo, J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha, Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability, Nature Communications 7, 10813.
[8] B. D. Brower-Toland, C. L. Smith, R. C. Yeh, J. T. Lis, C. L. Peter-son, and M. D. Wang, Mechanical disruption of individual nucleosomes reveals a reversible multistage release of DNA, Proceedings of the Na-tional Academy of Sciences 99, 1960.
[9] L. de Bruin, M. Tompitak, B. Eslami-Mossallam, and H. Schiessel, Why Do Nucleosomes Unwrap Asymmetrically?, The Journal of Phys-ical Chemistry B 120, 5855.
[10] B. Eslami-Mossallam, R. D. Schram, M. Tompitak, J. v. Noort, and H. Schiessel, Multiplexing Genetic and Nucleosome Positioning Codes: A Computational Approach, PLOS ONE 11, e0156905.
[11] M. Tompitak, L. de Bruin, B. Eslami-Mossallam, and H. Schiessel, De-signing nucleosomal force sensors, Physical Review E 95, 052402.
[12] J. Culkin, L. de Bruin, M. Tompitak, R. Phillips, and H. Schiessel, The role of DNA sequence in nucleosome breathing, The European Physical Journal E 40, 106.
[13] A. W. Mauney, J. M. Tokuda, L. M. Gloss, O. Gonzalez, and L. Pollack, Local DNA Sequence Controls Asymmetry of DNA Unwrapping from Nu-cleosome Core Particles, Biophysical Journal 115, 773.
[14] Y. Chen, J. M. Tokuda, T. Topping, J. L. Sutton, S. P. Meisburger, S. A. Pabit, L. M. Gloss, and L. Pollack, Revealing transient structures of nu-cleosomes as DNA unwinds, Nucleic Acids Research 42, 8767.
[15] Y. Chen, J. M. Tokuda, T. Topping, S. P. Meisburger, S. A. Pabit, L. M. Gloss, and L. Pollack, Asymmetric unwrapping of nucleosomal DNA propagates asymmetric opening and dissociation of the histone core, Pro-ceedings of the National Academy of Sciences of the United States of America 114, 334.
[16] W. K. Olson, A. A. Gorin, X.-J. Lu, L. M. Hock, and V. B. Zhurkin, DNA sequence-dependent deformability deduced from protein–DNA crys-tal complexes, Proceedings of the National Academy of Sciences 95, 11163.
[17] F. Lanka˜s, J. r. ˜Sponer, J. Langowski, and T. E. Cheatham, DNA Base-pair Step Deformability Inferred from Molecular Dynamics Simulations, Biophysical Journal 85, 2872.
[18] P. Prinsen and H. Schiessel, Nucleosome stability and accessibility of its DNA to proteins, Biochimie 92, 1722.
[20] L. De Bruin and J. H. Maddocks, cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA, Nucleic Acids Research 46, W5.
[21] A. Fathizadeh, A. Berdy Besya, M. Reza Ejtehadi, and H. Schiessel, Rigid-body molecular dynamics of DNA inside a nucleosome, The Euro-pean Physical Journal E 36.
[22] C. A. Davey, D. F. Sargent, K. Luger, A. W. Maeder, and T. J. Rich-mond, Solvent Mediated Interactions in the Structure of the Nucleosome
Core Particle at 1.9 ˚A Resolution., Journal of Molecular Biology 319,