• No results found

Despite decades of research into the nature of the chromatin fibers, there is still no consensus on its structure

N/A
N/A
Protected

Academic year: 2021

Share "Despite decades of research into the nature of the chromatin fibers, there is still no consensus on its structure"

Copied!
39
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The handle http://hdl.handle.net/1887/138082 holds various files of this Leiden University dissertation.

Author: Brouwer, T.B.

Title: The role of linker DNA in chromatin fibers Issue Date: 2020-11-04

(2)

C h a p t e r 1

I n t r o d u c t i o n

The hereditary information of a cell, its genetics, is contained within the sequence of base pairs of its DNA. Every human cell, for instance, contains approximately 2 meters of DNA when it is fully stretched out. This massive length needs to be compacted thousands of times, by folding it around nucleosomes which stack into dense chromatin fibers, to fit inside the cellular nucleus. Simultaneously, the useful information in the DNA needs to be dynamically accessible for the cell to sustain itself, respond to its surroundings, or adapt during tissue growth or prenatal development. How the cell realizes such a juxtaposition, strong compaction versus accessibility, is remarkable.

The level of chromatin compaction regulates which genes are actively read out, and which genes are transcriptionally silent. This epigenetic regulation can be triggered by a complicated interplay of post-translational protein modifica- tions [1–3], transcription factors [4–6], gene repressors [7, 8], or environmental queues [9–11]. Despite decades of research into the nature of the chromatin fibers, there is still no consensus on its structure; 1-start, 2-start, or disorganized.

Furthermore, the debate on the prevalence of such organized structures in vivo is ongoing to this day.

In this thesis, we used single-molecule force spectroscopy to study how the length of the linker DNA, the DNA between adjacent nucleosomes, impacts the structure of the folded chromatin fibers. Using a statistical mechanics model, we could extract their mechanical properties and quantify the energetic interactions between stacked nucleosomes. Single-molecule experiments have the advantage that they measure each tether individually, and are not subjected to ensemble averaging that can obscure unique properties. The bottleneck of single-molecule techniques, however, is the throughput. Here, nevertheless, we stretched more than 1000 chromatin fibers. This could be accomplished by the development of

(3)

a multiplexed Magnetic Tweezers apparatus, which could measure hundreds of molecules at once. Since multiplexing leads to an exponential increase in data throughput we developed a lightweight image analysis algorithm to process this data. In addition, we created a rigid base pair Monte Carlo model, to enrich our experimental data with simulations, which can provide structural insight that goes beyond interpretation of the structure through only the parameters force and extension. This combination of techniques provided a thorough examination of chromatin condensation and epigenetic regulation of Eukaryotes.

Furthermore, we investigated how archaeal histones compact DNA. Their mechanisms of compaction are suggestive of primordial bacterial-like transcrip- tion regulatory mechanisms preceding eukaryotic regulatory mechanisms. These experiments are highly relevant as Archaea are one of the main pillars of life, and form the link between humans and the last universal common ancestor in life on earth.

(4)

1.1 The Tree of Life 3

1.1 The Tree of Life

It is overwhelming to fathom the complexity, coexistence, and codependency of the many aspects of life that exist in a symbiotic relationship in the ecosystems on earth. Their staggering biodiversity can be vastly different, even though they frequently exist within proximity of each other. Life is everywhere: even places that seem hostile, such as in hydrothermal vents or deeply acidic environments, harbor life.

As Richard P. Feynman once mentioned, one can look at a flower and appreciate it as an artist, but the beauty of life is not limited to that [12].

Modern science grants us the opportunities to look inside the flower, inside the cells that constitute it, and appreciate the complicated intracellular mechanisms that work together in perfect harmony to sustain life.

The complexity of life originated 3.5 billion years ago from a simple, single cell. Every generation added layers of complexity, driven by evolution, adapta- tions to an ever-changing environment. With time, the world inside cells became increasingly complex. Our current understanding of evolution arose from the combination of two different ideas, by two different people, who walked the earth together but supposedly never met: Charles Darwin and Gregor Mendel.

1. Darwin realized that evolution depends on the existence of heritable variability within a species to generate the differences between ancestral and descendant populations, as stated in On the Origin of Species, his groundbreaking book from 1859 which is considered to be the foundation of evolutionary biology [13]. The book was highly controversial because of its volatile ideas which argued against creationist values. In addition, the theory was problematic for the scientific community since there was a lack of direct evidence for natural selection at the time, and genetics was not well understood.

2. Mendel investigated inheritance by cross-pollinating pea plants and discovered that crossbreeding of animals and plants could favor certain desirable traits [14]. He coined the terms recessive and dominant, and established many of the rules of heredity, which are now known as the laws of Mendelian inheritance. He published his ideas before a proper understanding of the cellular basis of sexual reproduction was achieved.

For many years, it was not obvious that Mendel’s studies of heredity had any relevance to Darwin’s theory of evolution. It would take nearly 60 years for their ideas to be combined in the modern theory of evolution. Today, the most compelling evidence for evolution can be found in the fact that even though highly-developed multicellular organisms such as humans are vastly different and infinitely more complex than simple, unicellular organisms, we share innumerable mechanisms that sustain (our) life.

(5)

Figure 1.1

Nucleosome folding in humans is highly analogous to DNA compaction in Ar- chaea. a) The phylogenetic tree of life shows that Archaea form a third branch between Bacteria and Eukaryota. Figure reprinted from Wikimedia Commons. A comparison of three archaeal histone HMfB homodimers (b) and a hexasome (one H2A-H2B dimer and histone tails are removed) (c) shows that Archaea express a form of histones that are structurally homologous to eukaryotic human histones. The similarity of the histone fold of an HMfB homodimer (d) and an H3-H4 heterodimer (e) is remarkable. Figure adapted from [15].

By studying how these mechanisms increase in complexity between species we can retrace our evolutionary path and organize it in the tree of life, which shows how all kingdoms of life are related, including living and extinct species.

Figure 1.1a depicts part of the phylogenetic tree of life, dividing the living world into three domains: Bacteria, Archaea, and Eukaryota. The trunk of the tree links the three domains together, indicating that all life originates from a common ancestor.

According to most definitions, the cell is the smallest common denominator of life. Some cells are complete organisms in themselves, where others are part of a multicellular organism. Cells can respond to environmental queues, metabolize, reproduce and make copies of themselves. Frequently, unicellular organisms live with multicellular organisms in a complex symbiosis. Cells can look vastly different from each other. Imagine, for instance, the brick-like structure of a plant cell versus the complicated structure of a nerve cell with its long protrusions. These are both cells, however, they are unrecognizable from each other at first glance. Nevertheless, all cells share distinct universal features and are made up of the same major classes of organic molecules: nucleic acids, proteins, carbohydrates, and lipids.

There are two categories of cells: prokaryotes and eukaryotes. Prokary- otic cells are the simplest form of life. They are formed by a membrane surrounding the cytoplasm and contain no nucleus or organelles. Prokaryotes are colloquially known as Bacteria. Regardless of their simplicity, they consti- tute a huge fraction of life of Earth. Eukaryotic cells are more complicated,

(6)

1.2 DNA 5 and contain membrane-bound organelles next to a nucleus. They can exist as unicellular organisms, or multicellular organisms such as fungi, plants, animals.

Eukaryotes can only live with oxygen, while some prokaryotes are able to live using different substances, hence, prokaryotes can be found in almost every environment on Earth.

Part of the complexity of eukaryotes seemed to have arisen from when a larger prokaryotic cell engulfed a smaller one and both continued to live in a symbiotic relationship [16]. The engulfed cell thrived within the larger one as an organelle. In particular chloroplasts and mitochondria are thought to have arisen in this manner, as they still retain a separate genome and separate individually.

The microbiologist Carl Woese argued in 1977, based on the analysis of ribosomal RNA, that the domain of prokaryotes should be further divided into Bacteria and Archaea (as shown in Figure 1.1a) [17–19]. Archaea are also prokaryotic unicellular organisms (without compartmentation), but their genetic makeup is vastly different from Bacteria. Archaea possess a unique evolutionary history and diverse metabolisms, which allow them to feed on inorganic matter and to withstand extreme environments such as high temperatures, extremely salty condition, or places without oxygen.

Even though unicellular Archaea are evolutionary quite distant from eu- karyotes we can observe remarkable similarities in cellular mechanisms, for instance, in genomic compaction and organization. Eukaryotic cells compact three-quarters of their genome by wrapping their DNA around a histone octamer to form a nucleosome. Figures 1.1b and 1.1c illustrate that nucleosome folding in humans is highly analogous to DNA compaction in Archaea. Figures 1.1d and 1.1e furthermore show that the structure of the building blocks is virtually identical. These similarities, between two species that are strongly separated on the phylogenetic tree of life, form compelling evidence that all life on earth today originated from a common ancestor.

1.2 DNA

If life on earth can be expressed as a tree, the root of the tree is surely formed by DNA. DeoxyriboNucleic Acid (DNA) is the most important molecule shared by our common ancestor and virtually all lifeforms today. Even viruses, a peculiar form of life, carry genetic material. DNA contains the blueprint for all life: it dictates growth, development, function, and regulates reproduction.

DNA is unique to each individual, although all cells in a multicellular body share the same DNA.

Many researchers need to be credited for the discovery and understanding of DNA. DNA was first isolated in 1869, by the Swiss physician Friedrich Miescher in the nuclei of white blood cells [20]. He came across a substance that was

(7)

high in phosphorus content and was resistant against proteolysis. He coined the substance nuclein. The implications of the discovery were long underestimated.

For years, scientists continued to believe that proteins held all of our genetic content, as they could not fathom that nuclein was complex enough to contain all of the information.

Several decades later, the American researcher Phoebus Levene, specialized in chemistry of biomolecules, discovered the three major components of a single nucleotide in 1909: phosphate, sugar, and a nucleobase [21]. He furthermore identified that there were 4 unique nucleobases: adenine (A), thymine (T), cytosine (C), and guanine (G) which were apparently randomly alternating.

The biochemist Erwin Chargaff was subsequently credited with two major discoveries. First, he discovered that the total amount of purines (A + G) and the total amount of pyrimidines (C + T) were equal, commonly referred to as Chargaff’s rule. Second, he found that the nucleotide composition differed among different species – revealing its hidden purpose [22, 23]. These discoveries paved the way for a revolutionary publication that saw the light in the early fifties.

James Watson and Francis Crick famously resolved the structure of DNA

Figure 1.2

The structure of DNA. a) The chemical structure of DNA, as originally published by Watson and Crick in 1953 [24], consists of a sugar-phosphate backbone and nucleobases that contain the genetic code. Figure reprinted from [24], b) Schematic representation of base pairing through hydrogen bonds (dashed lines). Adenine (A) pairs with Thymine (T) by means of two hydrogen bonds, and Guanine (G) pairs with Cytosine (C) by means of three hydrogen bonds. Figure reprinted from Wikimedia Commons.

(8)

1.2 DNA 7 [24]. They relied heavily on the work of others, not only on Miescher, Levene, and Chargaff, but particularly the X-ray crystallography work by Rosalind Franklin and Maurice Wilkins inspired them to imagine the double-helical structural model of DNA [25–28]. The research by Linus Pauling on molecular distances and bond angles was also crucial to their work [29–31]. A major insight was to realize that DNA consisted of two complementary antiparallel strands. Watson and Crick put together all the pieces of the puzzle that resulted in the structure of DNA.

The structure that was resolved was of B-DNA, the most common form of DNA. Figure 1.2a depicts the original structure as it was published in 1953 [24].

This iconic figure describes the archetypical double-stranded helix, 2 nanometer (nm) in diameter. Watson and Crick identified five main characteristics of the

DNA structure:

1. Each strand is composed of a rigid sugar-phosphate backbone decorated with a variable sequence of nucleobases.

2. The sugar-phosphate backbone has a polarity: the orientation of the strand is 5' to 3' (or reversed), which refers to the 5th and the 3rd carbon atom in the sugar pentose to which the phosphate attaches.

3. The two strands are of complementary sequence.

4. The helix has a 10.4 base pairs pitch, with asymmetric minor and major grooves. The asymmetry allows for certain sequences to be recognized by protein or restriction enzymes. The outer edges of the nucleobases are exposed: available for hydrogen bonding, for instance interacting with (regulatory) proteins.

5. B-DNA is right-handed. Other forms of DNA are left-handed, for instance, A-DNA. An obscure form of DNA is Z-DNA, which is right-handed and possesses different mechanical and structural properties.

Since its publication in 1953, there have only been minor changes to the iconic structural model of DNA.

The genetic code is composed of a sequence of nucleobases. Figure 1.2b illustrates each of the nucleobases and their complementary pairing: A is always paired with T by means of two hydrogen bonds, while C is always paired with G by means of three hydrogen bonds. The different number of hydrogen bonds indicates that the stability of base pairs is not equal: the bond strength between C and G is approximately 0.75 kBT (-21 kcal/mol) where the bond strength between A and T is 0.5 kBT (-13 kcal/mo) at physiological conditions1 [33].

1kBT is the product of the Boltzmann constant and the temperature, which at room temperature equals 4.11 × 10−21Joule. kBTis frequently used as a unit of energy in biophysics.

For example, the interaction energy between nucleosomes (25 kBT) or between Archaeal histone proteins (2 kBT) can be expressed in kBT. Furthermore, the equipartition theorem states that each degree-of-freedom in an object contains 0.5 kBT in thermal equilibrium [32].

(9)

The strands of DNA are complementary. This degeneracy ensures that the genetic information is backed up on the antiparallel strand. Watson and Crick had a foresight that turned out to be crucial in the understanding of genetics, inheritance, and evolution: "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material" [24]. They understood that DNA contains all genetic information, and two copies of this information are stored on both strands, sustaining an efficient copying mechanism. What they nevertheless could not yet comprehend is how all this information is (selectively) accessed.

Information flows from the nucleus of the cell by means of RiboNucleic Acid (RNA): a biopolymer similar to DNA. Both RNA and DNA are nucleic acids, composed of a sugar-phosphate backbone decorated with nucleobases. RNA consists of a single strand, however, and in RNA the thymine is replaced by uracil (U). Unlike DNA, an RNA is stabilized by intramolecular interactions, resulting in unique higher-order structures.

The flow of information in the cell follows two distinct steps: transcription and translation.

1. Transcription: The genetic code is read out from the DNA by protein- complexes such as RNA polymerase. During transcription, a transcript is composed of pre-messenger RNA (pre-mRNA). The pre-mRNA structure is spliced to get rid of redundant information, and the mRNA transcript is excreted from the cell nucleus.

2. Translation: The mRNA transcript is translated into protein by a huge protein-RNA complex, the ribosome. Aided by various chaperone proteins, the newly synthesized protein is folded into its desired, functional shape.

The flow of information in the cell is frequently referenced as The Central Dogma of Molecular Biology, a term coined by Crick [34]. The dogma describes that there are two directions in the information flow: from DNA to DNA (DNA replication), and from DNA to RNA (transcription). Interestingly, RNA viruses follow an alternative route, from RNA to RNA (copying themselves), and from RNA to protein (translation). In both cases, these flows never seemed to go the other way around, hence the stated dogma.

The discovery of reverse transcriptase in 1970, which allows information to flow from RNA to DNA, challenged the dogma [35]. The DNA double-strand break repair mechanisms homologous recombination and non-homologous end joining undermined the dogma further. In the case of homologous recombination, the damaged genetic information of the cell is rewritten from a substrate that is (preferably) the sister chromatid [36]. In the case of non-homologous end joining, the damaged single-stranded overhangs form the template, although there are even models advocating for direct RNA-templated DNA repair [37].

In addition, retroviruses use RNA to alter the DNA sequence of their host [38].

(10)

1.3 Genetics and epigenetics 9 In short, there is mounting evidence that alterations of DNA based on RNA sequence is not merely restricted to exceptional cases, but is a more general phenomenon in nature.

For a long time, Crick’s essential argument still held: there was no evidence that the information in a DNA sequence could be rewritten from the protein level [39]. In recent years, however, this notion was challenged with the discovery of the CRISPR-Cas9 system, essentially a prokaryotic immune system that directs controlled changes in DNA sequence guided by an RNA template. This system was successfully used to manipulate the human genome in 2013 [40–42].

Another mechanism that regulates the expression of genes is RNA interference (RNAi). RNAi is a process where small RNA molecules (microRNA and small interfering RNA) degrade mRNA, resulting in post-transcriptional gene silencing [43]. RNAi is a natural process that cells employ to destroy RNA-based viruses.

However, these non-coding RNAs can also modulate transcription suggesting alternative mechanisms of in genetic organization that are not contained in the central dogma of microbiology.

1.3 Genetics and epigenetics

Genetics is the study of the information that is contained within the genome, genetic variation, and heredity in organisms. Studying the genome may provide information on the physical characteristics of an individual: e.g. if someone has blue eyes or brown, is tall or short, or the presence and shape of a molecular motor. Studying the genome does not, however, reveal anything on the avail- ability of the information to the individual cell. For this, one needs to know whether the genes are actually expressed, and this is regulated by epigenetics.

Epigenetics is the study of the regulation of genetics and developmental changes caused by modification of gene expression. It can tell us how, where, and when the genetic information is accessed [44]. For example, during cellular interphase when the cell is at rest, the cell only accesses the housekeeping genes:

the genes coding for the proteins necessary to maintain the functionality of the cell. The situation changes when, for instance, the external factors change, such as salt conditions, hypoxia, or after DNA damage. The cell has to react and swiftly access the genes that can counteract potentially perilous external factors. More substantial are the complicated epigenetic mechanisms at play during stem cell differentiation at different positions within the body. How does a cell know how to develop when it has the information of the entire genome available?

(11)

Figure 1.3

Chromatin in a cell nucleus exists in two forms: heterochromatin and euchro- matin. A cell nucleus was imaged with transmission electron microscopy (TEM) and stained, highlighting the packing of the chromatin. Heterochromatin is the most densely packed, yielding darkly stained regions within the nuclear envelope. Euchromatin is less densely packed and actively transcribed, yielding the lightly stained regions. Figure reprinted from [45].

Cells modify the structure in which DNA is organized to keep the genes repressed or active. In the nucleus, the DNA is globally observed to be in one of two states: in tightly packed, gene-poor transcriptionally silent heterochromatin;

or in loosely packed, gene-rich, transcriptionally active euchromatin [46].

Figure 1.3 depicts the nucleus of a cell where this bimodal behavior of DNA can be observed. Heterochromatin is highly condensed and appears as the darkly stained regions. Euchromatin is loosely condensed and is not readily stainable. In general, the density of DNA packing is indicative of the frequency of transcription.

Epigenetic regulation within a multicellular organism varies at different locations in the body. Figure 1.4 depicts epigenetic regulation of a part of the genome during embryonic development. The cells in developing skin tissue are

(12)

1.4 Chromatin 11

Figure 1.4

Epigenetics: gene regulation in action. The epigenetic activity of the human HOXA cluster on chromosome 7 shows a tight regulation of transcription in distinct parts of the body.

The HOXA cluster is part of the Hox genes that specify development of regions of the body.

In other words, it contains a plan of the embryo, and ensures that the correct structures form in the correct places of the body. The relative gene expression in the developing cells of the skin is very different than that of developing cells in the lung. This differentiation is thought to be mediated by changes in chromatin. Figure adapted from [47].

transcribing different parts of the genome in comparison to developing cells in the lungs [47].

Dysfunction of epigenetic regulation results in various medical conditions.

For instance, the epigenetic disorder type 1 diabetes is a medical condition that is characterized by abnormally high blood sugar levels, when beta cells in the pancreas stop producing insulin [48]. Type 1 diabetes frequently develops during adolescence, demonstrating that the body has the genetic information to produce insulin, but due to a complicated interplay between genetics (heritability), environment (age, obesity, nutrients), and epigenetic factors (DNA/histone methylation); the body stops producing it. This results in the inability to use glucose for energy or to control the amount of sugar in the blood [49, 50]. At the base of this epigenetic condition lies the DNA compaction, which controls the accessibility of the transcription machinery to the DNA, effectively switching genes on or off.

1.4 Chromatin

For an organism to survive, DNA needs to be compacted in an organized manner. In every human cell approximately 2 meters of DNA needs to fit in the cell nucleus, which is on average only 6 µm in diameter, while retaining accessibility to sustain cell function. To accomplish this, the DNA is carefully compacted into higher-order structures such as chromatin, as schematically depicted in Figure 1.5.

(13)

Figure 1.5

The compaction of DNA.To condense DNA to fit the genetic material in the nucleus, it is compacted in multiple steps. In the first level of compaction, DNA is wrapped around nucleosomes to form a beads-on-a-string conformation. Subsequently, the nucleosomes can interact to form 30-nm chromatin fibers. Ultimately, chromatin fibers can interact and condense to form chromosomes during mitosis, the highest level of compaction. Figure reprinted from [51].

In the first level of compaction, DNA wraps around a histone octamer to form a nucleosome. In the second level, nucleosomes interact to form dynamic chromatin fibers. In the third level, chromatin fibers interact and aggregate to form even higher-order structures to achieve further compaction. Despite several decades of research, there is still a lot of ambiguity about the structure or mechanical properties of these complexes. We know that during mitosis the DNA enters the highest form of compaction, where it is condensed 10,000 times into chromosomes. These characteristic structures are visible under the light microscope, however, many details on a molecular level are missing [52].

1.4.1 Nucleosomes

The fundamental unit of chromatin is the nucleosome. The nucleosome forms when two turns of DNA wrap in a left-handed superhelix around a histone core.

The core is made up of four pairs of histones: histone H2A combines with H2B,

(14)

1.4 Chromatin 13

Figure 1.6

The structure of the nucleosome. a) The nucleosome consists of a histone octamer core, built from two types of heterodimers: H2A–H2B (yellow-red) and H3–H4 (blue-green). All histones have tails that can interact with the DNA, adjacent histones, or adjacent nucleosomes.

The DNA is omitted in this image. Figure reprinted from [56]. In the crystal structure of the nucleosome core particle at 2.8 Å resolution the acidic patch in the H2A–H2B dimer (bright red) and the H4 tail from an adjacent nucleosome were resolved, indicating a mechanism for nucleosome stacking. Figure reprinted from [57].

and histone H3 combines with histone H4. Two H3-H4 dimers assemble to form a tetramer onto which two H2A-H2B dimers dock to complete the histone octamer. In most organisms, the nucleosome is complemented by an H1 linker histone. Linker histones bind to specific DNA geometries and prefer nucleosomal DNA over free DNA [53]. Incomplete nucleosomes also exist: tetrasomes (two H3-H4 dimers) and hexasomes (two H3-H4 dimers + one H2A-H2B dimer).

These incomplete nucleosomes are relevant in the biological context since they, for instance, relieve transcription induced stress [54, 55].

Nucleosomes were first observed on EM images of chromatin by Olins et al. in 1974 [58]. In that same year, their structure, composed of a histone octamer core wrapped by approximately 200 base pairs of DNA, was proposed by Kornberg [59]. A high-resolution crystal structure of the nucleosome was subsequently resolved by Luger et al. in 1997 [57]. The nucleosome constrains 146 or 147 base pairs and many different crystal structures of a nucleosomes were resolved, all featuring the same 1.65 wraps around the histone core [60]. Figure 1.6a depicts the histone octamer core, with the highly disordered and therefore unresolved histone tails, omitting the DNA. In recent years, crystallography of single nucleosomes has been refined, for instance by the work of Hitoshi Kurumizaka [61, 62]. He furthermore composed a song to commemorate the

(15)

20th anniversary of Luger’s landmark publication (scan the QR code to learn more).

In eukaryotes, all histones consist of a globular part and a flexible tail, as becomes clear in Figure 1.6a. Histone tails fulfill a variety of functions in vivo, for instance in epigenetic gene regulation. The histone tails can undergo post-translational modifications (PTMs) such as acety- lation, methylation, phosphorylation, and ubiquitination that directly or indirectly influence chromatin structure and control gene activity [63]. Acetylation and phospho- rylation generally result in transcriptionally active genes, where methylation and ubiquitination are often associated with gene silencing2 [66].

Histone tails are primarily involved in the mechanical stability of nucleosomes and chromatin [67, 68]. The positively charged tails can interact with the nucleosomal DNA, linker DNA, and acidic patches on the histone core by means of hydrogen bonds. Nearby nucleosomes can stack through interactions of the H4 tail with the H2A-H2B acidic patch of the adjacent nucleosome. This mechanism is indicated in Figure 1.6b [57, 68–73]. The H3, H2A, and H2B tails may also contribute to nucleosome stacking [68]. Stacked nucleosomes can form regular chromatin fibers. At low ionic strength, these fibers exist as a beads-on-a-string configuration. However, in physiological conditions, and especially in the presence of divalent salt, the nucleosome chains are strongly compacted into organized chromatin fibers [74].

1.4.2 Chromatin fibers

The structure of the chromatin fiber has a strong effect on the level of compaction and accessibility of the DNA, hence, it is inherently related to epigenetics. There are two main structural models to describe chromatin fibers: solenoid fibers and zig-zag fibers. Solenoid fibers form when neighboring nucleosomes interact to create a single stack of nucleosomes (1-start structures). The solenoid fiber was first described by Thoma et al. in 1979 [75]. The original structural model of the solenoid fiber is depicted in Figure 1.7a. Zig-zag fibers form where

2Epigenetic regulation is not only controlled by PTMs, but by a complicated interplay of transcription factors, gene repressors, or environmental queues. The DNA itself can be methylated in a promoter region which causes gene silencing. In addition, the cell employs several ATP-dependent remodeling complexes that displace, exchange, or evict histones from the chromatin fiber, for instance, remodelers from the SWI2/SNF2 group or the SWI group [64]. Furthermore, the cell employs several histone chaperones: complexes that control the folding of free histones into nucleosomes, which work together with the remodelers on histone deposition and eviction [65]. Frequently, a combination of mechanisms is involved. For instance, DNA methylation in combination with histones de-acetylation results in highly compacted chromatin and transcriptionally silent genes.

(16)

1.4 Chromatin 15

Figure 1.7

Two structural models have been proposed for the chromatin fiber. a) The solenoid fiber, as described in 1979 by Thoma et al. [75], forms a 1-start structure with adjacent nucleosomes stacked together. The presence of the H1 linker histone assists in the formation of the solenoid structure and the ionic strength controls the degree of compaction. Figure reprinted from [75]. b) The zig-zag fiber, as described in 1984 by Woodcock et al. [76], forms a 2-start structure with nucleosomes stacking with their next-neighbor. Figure reprinted from [76].

next-neighboring nucleosomes interact to create a double stack of nucleosomes (2-start structures), which rotates around itself (much like the double helix of DNA). The zig-zag fiber was first described by Woodcock et al. in 1984 [76].

The original structural model of the zig-zag fiber is depicted in Figure 1.7b.

The prevalence of the solenoid fiber, the zig-zag fiber, or the absence of regular higher-order structures is still being debated today.

The experimental evidence for the zig-zag fiber is quite extensive, exemplified by the X-ray structures by Schalch et al. [77] and the cryo-EM structures by Song et al. [70], revealing the structure of the tetranucleosomal units that constitute zig-zag fibers in great detail. Digestion experiments by Dorigo et al. were strong indications for the 2-start structure as well [72]. The solenoid fiber has not been resolved by single-particle cryo-EM reconstruction or X-ray diffraction, perhaps due to the dynamic nature of this structure. Robinson et al.

used EM to measure the diameter of folded chromatin fibers. The independence of this diameter on linker DNA length indicated a structure of compacted solenoid fibers [78, 79]. In addition, force spectroscopy experiments by our

(17)

Figure 1.8

Linker DNA plays an important role in the higher-order structure of chromatin fibers. a) Nucleosome core particles, without linker DNA, tend to form stacks that describe long arcs, as shown by the EM data. Figure reprinted from [83]. b) Chromatin with relatively short linker DNA, for instance NRL 167, forms a zig-zag structure where the linker DNA is approximately straight. c) The relatively long linker DNA in NRL 197 can accommodate a solenoid structure. The chromatin fibers in panel b) and c) depict chromatin fibers in a force spectroscopy experiment, flanked by long handles of bare DNA simulated using rbMC (described in Chapter 4). Two nucleosomes in the solenoid fiber have unstacked because of

the exerted force.

group provided evidence for solenoid fiber structures for chromatin fibers with long linker DNA [80, 81]. Cross-linking experiments by Kaczmarczyk et al.

suggested that these higher-order structures are stacked by means of the H4 tail interacting with an acidic patch on a neighboring nucleosome, arguing in favor of the solenoid fiber [82].

The linker DNA, the DNA between adjacent nucleosomes, plays a crucial role in the higher-order structure of chromatin. By connecting the nucleosomes, linker DNA may impose geometrical constrains that can affect higher-order folding. Dubochet et al. showed in 1978 that nucleosome core particles that lack linker DNA stacked into long arcs, as depicted in Figure 1.8a [83]. These experiments show the preferred stacking orientation of nucleosomes in the absence of constrains imposed by linker DNA.

The introduction of nucleosome positioning sequences, such as the Widom 601 sequence [84], made it possible to reconstitute well-defined nucleosomal

(18)

1.4 Chromatin 17 arrays, which allowed for a systematic analysis of the effect of linker length on fiber structure. Grigoryev et al. used sedimentation analysis and EM to quantify the compaction of chromatin as a function of linker length in 5 base pair steps. The experiments suggested strong compaction into 2-start structures, especially for short linker lengths [85–87]. Previous work by our group revealed that fiber stiffness and unfolding characteristics also depended on linker length, suggesting a 2-start structure for short linker lengths and a 1-start structure for long linker lengths [80–82]. The two alternative higher-order structures are depicted in Figure 1.8b, illustrating the required deformation of the linker DNA to sustain such structures.

In vivo, linker length is quantized, often in multiples of 10 base pairs, corresponding to the rotational pitch of DNA [88]. Linker DNA length correlates with the level of compaction. Fibers with mostly 10n linker length (n = 1, 2, . . . ) are strongly condensed and are observed in transcriptionally silent chromatin [85, 89]. In contrast, fibers with predominantly 10n + 5 linker length are less condensed and often correspond to transcriptionally active genes [90–92].

Next to the debate on the topology of the fiber, there is also no consensus on its occurrence in vivo [93]. The group of Kazuhiro Maeshima advocates that the structures that are found in vitro do not naturally occur in vivo. They imaged vitrified human mitotic cells using cryo-EM in a semi-native state and observed no structures with a diameter larger than 11 nm. They concluded that chromatin must be highly disorganized in vivo, which permits a more dynamic and flexible genome organization than would be allowed by regular structures [94–96]. Recent work, however, contradicts their view. Risca et al. measured contacts between alternating nucleosomes based on RICC-sequencing, which suggests the manifestation of 2-start structures in vivo [97]. Fierz et al. argued that if the chromatin fiber exists in vivo, 30-nm structures may occur over only a few kilobases or tens of nucleosomes, based on data from EM and chromatin capture techniques [98, 99].

On a scale of tens of kilobases to a few megabases, the mammalian genome is further organized into topologically associating domains (TADs) [100, 101].

These domains occupy discrete positions within the nucleus. Hi-C profiles show that throughout the different phases in the cell cycle and during cell differ- entiation, different parts of the genome associate with each other, suggesting epigenetic regulation on a much larger scale than that of the chromatin fiber [102]. An important factor in TAD formation and maintenance is the transcrip- tional repressor CCCTC-Binding factor (CTCF), which brings together remote strands of DNA, forming chromatin loops, and anchors these loops to cellular structures such as the nuclear lamina. In the nucleus, CTCF co-localizes with cohesin, a ring-shaped multi-protein complex that can trap DNA within the ring.

The inner diameter of the ring is approximately 40 nm, which can physically accommodate compacted chromatin fibers [103]. CTCF and cohesin define the boundaries between transcriptionally active euchromatin and heterochromatin,

(19)

and there is growing evidence that TADs are formed by active extrusion of chromatin loops by cohesin [104]. A future challenge would be to illuminate the relation between the epigenetic regulation on the chromatin fiber level and the TAD level, as at the mechanisms that underlie the formation of TADs are poorly understood.

1.5 Force spectroscopy experiments

It appears that the discussion about the structure of chromatin fibers, in vitro or in vivo, is still ongoing and structural measurements at the level of the chromatin fiber are lacking. In this thesis, I aim to add to this discussion by using single-molecule force spectroscopy on in vitro reconstituted chromatin fibers. Single-molecule force spectroscopy is a powerful tool way to study interactions, binding forces, and mechanical parameters of biopolymers. The extension of the molecule-of-interest is characterized as a function of force, which can be probed by a variety of techniques, such as magnetic tweezers (MT), optical tweezers (OT), atomic force microscopy (AFM), or acoustic force

spectroscopy (AFS) [105, 106].

With these experiments we were able to resolve the composition of individual chromatin fibers, the forces that hold DNA and histones together, and the interactions between nucleosomes. Cui et al. used OT to stretch the first (native) chromatin fiber in 2000. They described the mechanical properties of chromatin fibers that were recovered from chicken erythrocytes [107], but were unable to identify the unfolding of individual nucleosomes in the force- extension curves. Bennink et al. stretched chromatin fibers, reconstituted in situ on λ-DNA, with OT in 2001 [108]. They noticed discrete transitions that were attributed to unfolding of full nucleosomes. However, the step sizes were unequal in length. Brower-Toland et al. examined the unfolding of regular, in vitro reconstituted, nucleosomal arrays with a feedback-enhanced OT in 2002 and were the first to resolve two transitions in the unfolding of individual nucleosomes. The outer turn (76 base pairs) unfolded at low forces and the inner turn (80 base pairs) required higher force [109].

Kruithof et al. used MT to analyze the low-force transition in chromatin fibers and was able to resolve a highly compliant helical folding for the 30-nm chromatin fiber [80]. Their research laid the foundation for the development of the statistical mechanics model by Meng et al. [81], which identified a previously undetected third nucleosomal conformation, the extended conformation, between the singly-wrapped and the completely unwrapped conformation. In this publication the relationship between linker DNA length and fiber structure was first explored with MT. In 2016, Li et al. were able to separate the unstacking transition from the DNA unwrapping of the outer turn with MT, by stretching the chromatin fibers in zero-salt conditions [110].

(20)

1.6 Particle tracking 19 In our group, Hermans et al. used multiplexed MT to study native chro- matin [111]. These fibers, composed from 18S ribosomal DNA, were extracted from the nucleus of the yeast Saccharomyces cerevisiae using Locked Nucleic Acid (LNA) probes. Since native chromatin is assembled in vivo, it is highly heterogeneous due to PTMs of the histones and the absence of nucleosome positioning sequences Therefore, analysis requires higher statistics to recover common features. Hermans et al. measured a 24.8 nm width of the step- wise unwrapping transition, which was comparable to in vitro reconstituted chromatin.

In this thesis, we used multiplexed MT to thoroughly examine the effect of linker DNA length on the mechanical parameters and higher-order structure of eukaryotic chromatin (see Chapter 5). Furthermore, we stretched archaeal chromatin, composed of HMfA or HMfB proteins, to infer their higher-order structures (see Chapter 6). Figure 1.9 depicts our experimental setup as described by Kaczmarczyk et al. [112]. During a typical experiment, a chromatin fiber, flanked by bare DNA handles, is tethered to a glass cover slip by means of a digoxigenin – anti-digoxigenin interaction. The other end of the molecule is tethered to a paramagnetic bead by means of a streptavidin – biotin interaction.

Mounted above the sample is a set of horizontally oriented magnets that yield a uniform magnetic field over the entire field-of-view [113]. The force on the beads scales exponentially with the height of the magnets and causes the tethered DNA-chromatin complex to stretch and unfold. In addition, the magnetic field can be rotated to exert torque, which can build up in the tether if it is torsionally constrained. Torque has a distinct effect on the structure of DNA and chromatin [81, 114–116]. In this thesis, we used torque to stabilize the archaeal HMfB fiber (see Chapter 6).

1.6 Particle tracking

The diameter of DNA (2 nm) or chromatin (30 nm) is too small to resolve with a light microscope. Hence, to detect structural changes in chromatin fibers we measure their extension by tracking the position of the paramagnetic bead attached to the tether. In our experimental setup, we use a collimated LED to illuminate our beads, resulting in a characteristic diffraction pattern that is recorded by a camera. This circular hologram arises from interference between the incident beam and the beam scattered off the bead. The size of the diffraction pattern scales with bead height. Therefore, the three-dimensional position of the magnetic beads can be deduced from the two-dimensional hologram.

There are a plethora of bead tracking techniques available for MT. When only the in-plane position of the bead is desired, the xy-position of the bead could, for instance, be resolved by a center-of-mass calculation [117–119], or

(21)

Figure 1.9

Chromatin fibers are stretched by a multiplexed magnetic tweezers apparatus.

The setup was able to measure hundreds of chromatin molecules in parallel. The inset shows a chromatin fiber, flanked by long handles of bare DNA, tethered to a paramagnetic bead (not to scale). Image reprinted from [112].

(22)

1.6 Particle tracking 21 by fitting the diffraction pattern with a Gaussian profile [120–122]. These algorithms are simple, fast, and yield nm precision in two dimensions [123].

Further analysis of the hologram is required to extract the three-dimensional position of the bead. Lee et al. fitted the diffraction pattern with Lorentz–Mie scattering theory, which resolved the refractive index as well as the three- dimensional position of the bead with nm precision, at the cost of being rather computationally intensive [124].

A more frequently implemented algorithm in MT is based on the auto- correlation method introduced by Gelles et al. [115, 123, 125–128]. The diffrac- tion pattern of a bead is cross-correlated with its mirror image, a predefined kernel, or a previous image. The cross-correlation yields a peak, corresponding to the xy-position of the bead. At the peak, a radial profile is reconstructed.

This profile can be compared to a pre-measured look-up table (LUT) to extract the z-position of the bead. The cross-correlation method yields nm accuracy, speed, and simplicity. Nevertheless, without further optimization it can only process a couple of beads in real-time and there is cross-talk between the ob- tained coordinates in three dimensions. In Chapter 3 we describe a new tracking method, which is lightweight, robust, and therefore optimized for multiplexed experiments.

1.6.1 Stretching DNA

To describe the mechanical properties of chromatin we must first have a clear understanding of the mechanical properties of DNA. As physicists, we can simplify the underlying molecular complexity of DNA, and view it as a polymer that is characterized by its bending stiffness. This stiffness is often expressed in terms of the persistence length P . Formally, the persistence length of a polymer is the length after which statistical correlations in the direction of the polymer are lost. This implies that a polymer that is much longer than the persistence length will be in a random coil conformation, whereas a polymer shorter than the persistence length can be described by a straight rod. Another way of defining persistence length is the length over which a bend over 1 radian can be made with energy cost kBT [129]. Since most other biopolymers have a persistence length smaller than 1 nm, double-stranded DNA is considered a semi-flexible polymer with P = 50 nm [130]. When a polymer is subjected to a sufficiently high force, its backbone starts to extend beyond its contour length. The extensibility of polymers is expressed in the stretch modulus S.

This parameter describes the force that would stretch the polymer to twice its contour length, and equals approximately 1000 picoNewton (pN) for DNA in physiological conditions [81, 131, 132]. These two parameters, P and S, are sufficient to describe the mechanical response of DNA to force F .

Two mathematical models are frequently used to describe the mechanical response of biopolymers to increasing force. To characterize DNA at forces

(23)

up to several pN, the freely-jointed chain model (FJC) can be used. The FJC describes a polymer as a chain of N freely jointed monomers of fixed length b (the Kuhn length) which follow random-walk statistics. The contour length of the polymer is L = bN, and the average end-to-end distance of the chain is b

N . The extension zFJC is described by zFJC(F ) = L

 coth

 F b kBT



kBT F b



, (1.1)

where the Kuhn length b can be expressed as P/2 [133]. For low forces, F b ≪ kBT , the effective spring constant k follows [134]

k = 3kBT

bL . (1.2)

The FJC, however, does not accurately capture the stretching of DNA when the force is higher [129]. For a more accurate description of a larger force regime, DNA can be modeled with the worm-like chain model (WLC). The WLC, also known as the elastic rod model or the continuous Kratky-Porod model [132], introduces cooperativity between the neighboring monomers and describes the molecule as a flexible rod that curves smoothly as a result of thermal fluctuations. The extension zWLC is described by

zWLC(F ) = L

1 −1 2

s kBT

F P + F S

. (1.3)

The last term is an extension of the WLC and included stretch modulus S. The WLC, depicted in Figure 1.10 by the dark dashed line, describes the extension of DNA under forces below 65 pN to a good approximation and will be mostly used throughout this thesis.

When DNA is stretched, it describes a force-extension curve that spans several regimes. The first regime, the linear entropic elasticity regime, takes place below the characteristic force F = kBT /P = 0.08 pN. Here, the molecule behaves as a Hookean spring [129]. The second regime, the non-linear entropic elasticity regime, is observed when this characteristic force is exceeded and the reduction of the conformational entropy defines the extension. The force- extension graph describes a characteristic arc, where several pN is sufficient to induce significant distance between the ends. The third regime, the enthalpic stretching regime, becomes apparent for forces larger than 10 pN, where the extension approaches the contour length. The DNA deforms, causing the extension to exceed the contour length [135, 136].

1.6.2 Stretching chromatin

Stretching chromatin is far more complicated than stretching DNA. To illustrate this, imagine stretching a chromatin fiber with 20 nucleosomes. We then need

(24)

1.6 Particle tracking 23

Figure 1.10

Force spectroscopy experiments resolve the higher-order structure of chromatin.

A typical force-extension curve was measured with the multiplexed MT setup, depicting the characteristic unfolding of a chromatin fiber from which the fiber stiffness k, rupture energy ∆G1, and the partial unwrapping energy ∆G2could be extracted. Each nucleosome conformation and its corresponding extension is depicted in the inset. The colors of the scatter plot roughly correspond to the different conformations.

to describe the behavior or 162 molecules (20 × 8 histones + 2 DNA strands).

Moreover, all histones can detach from the fiber at any given moment. In our experimental approach, the chromatin fiber is furthermore flanked by bare DNA handles. Hence, the total measured extension is the sum of the extension of the handles, dictated by the WLC, and the extension of the fiber, which is composed of the nucleosomes that can be various conformations.

When chromatin fibers unfold by force, their individual nucleosomes progress through a sequence of transitions and conformations, as illustrated in the inset of Figure 1.10. In the case of eukaryotic chromatin, the nucleosome confor- mations are: (I) the fully wrapped fiber conformation, (II) the partially unwrapped nucleosome conformation, (III) the singly wrapped nucleo- some conformation, and (IV) the fully unwrapped nucleosome conforma- tion. To complete these transitions, the work done by the stretching force must successively balance the rupture energy ∆G1, the interaction energy ∆G2, or the wrapping energy ∆G3. The first two transitions are in thermodynamic equi-

(25)

librium, but the third is not, as illustrated by the stochastic stepwise behavior of the transition.

The force-extension curve of unfolding a complete chromatin fiber with 20 nucleosomes is shown in Figure 1.10. At low forces, the 20 nucleosomes that comprise this fiber are stacked and fully wrapped. In this regime, the fiber stretches linearly with force. Increasing the force causes the nucleosomes to unstack and ultimately unwrap all the nucleosomal DNA until the curve follows the WLC of the DNA tether.

Because there are so many factors that play a role in unfolding chromatin, it becomes increasingly complex to model. This is, however, where statistical physics provides a solution. The total set of nucleosomes, with each nucleosome in a specific conformation, composes a state, featuring its typical extension and free energy. The total extension ztot(F ) and total free energy gtot(F ) of this state are subsequently calculated by adding the extensions and free energies of all nucleosomes that exist in this state. The force-extension curve can then be fitted with the Boltzmann weighted sum of all states:

hztot(F )i = 1 Z

X

izi(F ) exp(−gi(F ) + W

kBT ), (1.4)

where zi(F ) and gi(F ) are the extension and the free energy of nucleosome i, and Z describes the partition function Z =Piexp (−(gi(F ) + W )/kBT ). As zi(F ) and gi(F ) depend on the mechanical properties of the nucleosomes in the fiber, fitting the equation yields the fiber stiffness k, rupture energy ∆G1, and the interaction energy ∆G2. In Chapter 5 we have characterized a variety of chromatin fibers, with different linker length, linker sequence, histone variants, and the number of repeats.

The same approach can be followed to describe the extension of other protein-DNA complexes under force. In Chapter 6 we have identified the various conformations of the HMf dimer that compose a hypernucleosome, and developed an analogous statistical mechanics model that revealed stacking energy gstack, and wrapping energy gwrap of HMfA- and HMfB-DNA complexes.

1.7 Rigid base pair Monte Carlo simulations

To complement our force spectroscopy experiments we performed Monte Carlo (MC) simulations of single chromatin fibers. MC simulations are a great way to study complicated stochastic physical systems. By coarse-graining them, we can simulate these systems in experimentally accessible size and time scales, while retaining their important physical properties. These simulations complement the experimental data by providing structural insights that cannot be directly obtained from force spectroscopy experiments alone.

Chromatin fibers have been simulated before in varying degrees of complexity.

For instance, De Bruin et al. [137], Dobrovolskaia et al. [138], and Lequieu

(26)

1.7 Rigid base pair Monte Carlo simulations 25

Figure 1.11

The structure of both DNA and chromatin can be coarse-grained into rigid body models. a) The structure of DNA can be simulated as a chain of rigid base pairs. Each base pair has six degrees-of-freedom with respect to its neighbor: three translational ones (shift, slide, rise), and three rotational ones (tilt, roll, twist). Figure reprinted from [146]. b) The orientation of stacked nucleosomes can also be described by step parameters as described by Korolev et al. Figure adapted from [147].

et al. [139] detailed the unwrapping of single nucleosomes. Kepper et al.

included interacting nucleosomes as single units with straight linker DNA [140]. Collepardo-Guevara and Schlick modeled flexible DNA linker between nucleosomes, but, did not allow nucleosomal DNA unwrapping [141, 142]. The recent work of Norouzi and Zhurkin included flexible linker DNA, nucleosomal DNA unwrapping, and interacting nucleosomes, however, their simulations only focused on the zig-zag structure [143–145].

In Chapter 4 of this thesis, we introduce MC simulations of chromatin fibers that include flexible linker DNA, geometrically unconstrained nucleo- somes, nucleosome stacking, nucleosomal DNA unwrapping, DNA handles, and various higher-order structures. In this way, we could faithfully simulate force spectroscopy on single chromatin fibers in silico.

We based our approach on the rigid base pair model [148]. Rigid base pair Monte Carlo (rbMC) simulations compute DNA trajectories, by treating each base pair as a rigid body that can move with respect to its neighbor. The relation between one base pair and its neighbor is determined by six degrees- of-freedom, which are depicted in Figure 1.11a. The degrees-of-freedom are divided into translational step parameters (shift, slide, rise) and rotational step

Referenties

GERELATEERDE DOCUMENTEN

We discuss representations and bounds for the rate of convergence to stationarity of the number of customers in the system, and study its behaviour as a function of R, N and the

A computational study on the nature of DNA G-quadruplex structure Gholamjani Moghaddam,

Following the results obtained from the MD simulations, we provide detailed insight into the nature of interactions between other quinazolone derivatives and c-KIT G-quadruplex which

In Martini force field, bonded interactions which include bonds, angles and dihedrals are optimized based on atomistic simulations in a bottom-up approach. The non-bonded

The results revealed that the QD-NH- CO- arrangement in quinazolone derivatives improve binding affinity toward c-KIT G- quadruplex and the amino substituents play a crucial role

Moreover, when two ligands have the same number of hydrogen bonds, the nonpolar energy contribution of binding free energy which is mainly attributed to van der Waals (º-º

Using a quantum mechanics/molecular mechanics (QM/MM) scheme, we carried out a series of simulations to identify the effect of the size and substitution patterns of three

The results revealed that the arrangement of amido bond in quinazolone derivatives improves binding affinity toward G-quadruplex and the terminal amino substituents play a cru-