Transcriptional Gene
Regulation in Eukaryotes
Overview
Gene expression
Transcription
Regulation of eukaryotic transcription
Influence of chromatin structure
Oncogenes
Techniques
Orphanides, Cell 2002
Control of gene expression at any stage:
Activation of gene structure
Initiation of
transcription
What is a gene?
“The entire nucleic acid sequence that is
necessary for the synthesis of a functional
polypeptide or RNA molecule”
Overview
Gene expression
Transcription
Regulation of eukaryotic transcription
Influence of chromatin structure
Oncogenes
Techniques
Transcription
Initiation, elongation, termination
Catalyzed by RNA polymerase
– “Transcription bubble”: DNA transiently separated into single strands
– One strand is used as a template
– Unwinding point & rewinding point
– Rate 40 nucleotides/second at 37 for bacteria
RNA polymerase
– Many subunits: catalytic site, CTD with (YSPTSPS)
n– pol I, pol II , pol III
Initiation
Bacteria
Molecular details of
gene expression control
in bacteria: lac operon
E. coli (Jacob & Monod
1960s)
Bacterial enhancer
Glutamine synthase
Eukaryotes
Basal transcription apparatus (general factors & RNA polymerase)
Proximal cis-regulatory module
Distal cis-regulatory modules
Modules = discrete DNA elements that contain specific
sequence motifs with which DNA binding proteins interact and transmit molecular signals to genes
Promoter
Enhancer
BTA
– General factors: TFIIx
– Mechanics of initiating RNA synthesis at all promoters
– Determines location of transcription startpoint
– Complex with RNA polymerase – TATA
• ~ 25bp upstream
• 8bp consensus of A•T pairs
• Tends to be surrounded by
G•C rich regions
• TBP, 11 TAFs : TFIID (~800kD)
– TATA-less promoters:
• Inr Py
2CAPy
5(-3 to +5)
– Promoter-prediction: TATA-box, C-G enrichment
– 50%
TATA Inr
Inr TATA
6 4
132
3
12 19
Eponine Promoter
Inspector
Promoter-proximal region
Efficiency and specificity of transcription depend
on binding of transcription factors
Promoter
recognition
Function = to be recognized by proteins; so differs from exon, …
Any essential nucleotide sequence should be conserved
– Some variation is permitted
– When is it sufficiently conserved?
– Idealized sequence with base most often present:
consensus sequence by aligning all known examples – Only conservation of very short sequences; 60 bp
associated with RNA pol lack conservation
Variety of elements can contribute, none is essential for all promoters (mix & match principle)
CAAT box ~ -80bp GGCCAATCT
– increases promoter strength
– Bound by CTF/NF1 family, CP1 & CP2, C/EBP, ACF
GC box GGGCGG
– SP1
Octamer (8bp) ATTTGCAT
– Bound by Oct1 (ubiquitous): activates histon H2B
– Bound by Oct2 (lymphoid cells): Ig kappa light chain
– context is important
Modular nature of the promoter:
– Equivalent regions can be exchanged
– Main purpose = to bring the factors they bind into the vicinity of the initiation complex
– Protein-protein interactions determine the efficiency of the initiation reaction
Sequence elements influence the frequency of initiation
Repression of transcription:
– Generally by influencing chromatin structure
– By repressors, e.g. Dr1/DRAP1 binds to TBP and
CAAT displacement protein (CDP)
Modules
Enhancers, silencers
5’ region, distal
Modules
50 bp to 1.5 kbp in size
4-8 TFs (often multiple sites); higher density of regulatory elements than in the promoter
Many elements are common elements in promoters, e.g.
AP1 and the octamer
Can stimulate any promoter placed in its vicinity
Can function anywhere (cfr -globin: 200 fold in vivo) ;
Position relative to promoter can vary substantially; can
function in either orientation
Binding sites for activators that control transcription of the mouse transthyretin (TTR)
promoter in hepatocytes. HNF = hepatocyte nuclear factor. [See R. Costa et al., 1989, Mol. Cell Biol. 9:1415; K. Xanthopoulus et al., 1989,Proc. Nat’l. Acad. Sci. USA 86:4117.]
Example: TTR
Example: muscle specific modules
Example: -globin
Model for the control of the human -globin gene. Some of the gene regulatory proteins shown, such as CP1, are found in many types of cells, while others, such as GATA-1, are present in only a few types of cells including red blood cells and therefore are thought to contribute to the cell- type specificity of -globin gene expression. (Adapted from B. Emerson, In Gene Expression:
General and Cell-Type-Specific [M. Karin, ed.], pp. 116-161. Boston: Birkhauser, 1993.)
How?
Current view:
– same sort of interaction with basal apparatus as the proximal promoter module
– Increase the concentration of transcription factors in the vicinity of the promoter
Intervening DNA: extruded as a large “loop”
Generality: not yet clear (what proportion of
promoters require an enhancer?)
Four activators enriched in
hepatocytes plus the ubiquitous AP1 factor bind to sites in the hepatocytespecific enhancer and promoter-proximal region of the TTR gene.
The activation domains of the bound activators interact
extensively with co-activators, TAF subunits of TFIID,
Srb/Mediator proteins, and general transcription factors, resulting in looping of the DNA and formation of a stable
activated initiation complex.
Cooperative assembly
Limited knowledge
Experimentally verified binding sites
Experimentally verified “composite elements” or CE’s
– GR site + AP-1 in proliferin promoter
– Synergistic: result in non-additively high level
– Antagonistic: overlapping sites, masking an activation domain,…
– Direct or through coactivator
Few modules characterized that have multiple
elements, some in developmental biology
Side-track: Transcription factors
5% of our proteins
Activities controlled in regulatory pathways
Independent domains responsible for activities:
– Recognition of specific target sequences – Binding to other components
of the transcription apparatus
– E.g. yeast GAL4
Protein-DNA interactions
– Proteins with high affinity for a specific sequence also possess a low affinity for any (random) DNA sequence – E.g. Lac repressor E. coli: Free:bound = 10
-4– High-affinity site competes with the large number of low-affinity sites; repressor binds 10
7times better to operator DNA (bound 96% of time for 10
molecules/cell)
How the different base
pairs in DNA can be
recognized from their
edges without the need to
open the double helix.
The binding of a gene regulatory protein to the major groove of DNA.
Typically, a protein-DNA interface consists of 10 to 20 such contacts,
involving different amino acids, each
contributing to the binding energy of
the protein-DNA interaction.
Zinc finger motif
– Common motif in DNA binding, e.g. SP1 has 3
(A) The structure of a fragment of a mouse gene regulatory protein bound to a specific DNA site. This protein
recognizes DNA using three zinc fingers of the Cys-Cys-His-His type arranged as direct repeats. (B) The three
All of the proteins bind DNA as dimers in which the two copies of the recognition helix (red cylinder) are separated by exactly one turn of the DNA helix (3.4 nm). The second helix of the helix-turn-helix motif is colored blue. The lambda repressor and cro proteins control
Helix-Turn-Helix
Homeodomains
– Related to helix-turn-helix bacterial repressors – Homeobox = 60 AA residues
– E.g. en, eve, Hox, Oct-1, Oct-2 (Oct also have Pou domain next to homeodomain)
The homeodomain is folded into three alfa helices, which are packed tightly together by hydrophobic interactions (A). The part containing helix 2 and 3 closely resembles the helix-turn-helix motif, with the recognition helix (red) making important contacts
Helix-loop-helix (HLH)
– DNA binding (helix) & dimerization
– Class A: ubiquitouslyh expressed proteins, e.g.
E12/E47
– Class B: tissue-specific expression, e.g. MyoD, myogenin, Myf-5
– Myc proteins (separate class)
Leucine zippers fig 21.15
– Dimerization motif
– E.g. Jun+Fos = AP1
– Gcn4 ->
Steroid receptors
– Independent domains: DNA binding, hormone binding, and dimerization
Cortisol - glucocorticoid receptor (GR).
Retinoic acid - retinoic acid A receptor (RAR).
Thyroxine - thyroid hormone receptor (TR).
Figure 1 Genome-wide comparison of transcriptional activator families in eukaryotes.
The relative sizes of transcriptional activator families among Homo sapiens,
D. melanogaster, C. elegans and S. cerevisiae are indicated, derived from an analysis of
eukaryotic proteomes using the INTERPRO database, which incorporates Pfam, PRINTS
Alternative model
Transcription factories
cfr. replication factories
Active RNA polymerases are concentrated in discrete 'factories' where they work together on many different templates
Complexes for transcription and RNA processing are likely to be immobile structures within the gel-like nucleoplasm
(Burns et al, 2001; Kimura et al, 1999)
Transcriptional interference : phenomenon where
transcription of one gene prevents transcription of an adjacent gene. Discovery: Cells were transfected with a retroviral
vector encoding resistance to neomycin and azaguanine, and clones harboring a single copy of the vector selected.
Expression of the 3' gene was suppressed when selection
required expression of the 5' gene, and vice versa. In addition,
Cook, 1999 (Science)
• Enhancers
•dynamic equilibrium
Recap: evolution of understanding of
eukaryotic transcription
Termination
Bacteria
Eukaryotes (?); by RNA processing
Overview
Gene expression
Eukaryotic transcription
Regulation of eukaryotic transcription
Influence of chromatin structure
Oncogenes
Techniques
Activate/inactivate a TF
Transport through nuclear pores from cytoplasm to nucleus (e.g. masking NLS, nuclear localization signal, can regulate this transport)
Link to Ubiquitin protease system
– Rapid turnover of promoter bound TF: resets signaling pathway: cell can continuously monitor its environment
Tissue-specific synthesis
– Development, e.g. homeodomain proteins
Modification
– Phosphorylation, acetylation, methylation
– E.g., AP1 (= Jun+Fos) active form by phosphorylation – E.g., p53 acetylated (modulates interactions with coactivator
and repressor proteins
Ligand binding
– E.g. Steroid receptors
– Influence: localization or DNA-binding ability
Cleavage
Inhibitor release
– E.g. NF-B + I- B (release in B lymphocytes)
Change of partner (active partner displaces
inactive partner)
Examples:
GATA-1 CAP NtrC Adenovirus E1A NF-KB/
+ CBP/p300 glucocorticoid receptor
Pathways…
1
2
Level 1 = active/inactive factor
Level 2 = cooperation of multiple factors within a module (all present and active, and all repressors inactive or absent)
Level 3 = multiple autonomous modules per gene
– Each module can independently activate the gene
– Each has a specific function (e.g. activation in certain cell type or at particular stage in dvl)
– different circuits of regulation, e.g. metallothionein gene (MT): heavy metals and steroids, fig 21.1
– Gene can respond to multiple signaling pathways
– Facilitates fine-tuning of transcript levels
Combinatorial and context dependent regulation of transcription
– one factor can induce transcription of one gene
while repressing that of another
Example: eve
Experiment demonstrating the modular
construction of the eve gene regulatory region.
(A) A 480-nucleotide-pair piece of the eve regulatory region was removed and inserted upstream of a test promoter that directs the synthesis of the enzyme -galactosidase (the product of the E. coli lacZ gene). (B) When this artificial construct was reintroduced into the genome of Drosophila embryos, the embryos expressed -galactosidase (detectable by histo- chemical staining) precisely in the position of the
-
+
rho
• Dorsal (Dl)
• Twist (HLH)
• a HLH
• Snail (-)
Principles for specification
1. cis-regulatory transformation of input patterns into spatial domains of differential gene expression
2. Always assemblages of diverse target sites because multiple inputs are required
3. Output=novel with respect to any one of the incident inputs + more precise in space and time => “information processing”
4. Every specific type of interaction that can be detected in vitro is fundamentally significant (it is unlikely that highly specific site clusters, which are of improbable random
occurrence would have no function)
5. Negative & positive inputs
Cis-regulatory logic device
endo16 of Strongylocentrotus (zee-egel)
Secreted embryonic gut protein
“hardwired biological computational
device”
Overview
Gene expression
Initiation of transcription
Regulation of transcription
Influence of chromatin structure
Oncogenes
Techniques
Chromatin
Eukaryotic genomes are packaged with chromatin proteins
Heterochromatin (highly condensed, untranscribed)
Euchromatin (more accessible, transcribed)
Each cell: unique pattern of heterochromatin and
euchromatin
Nucleosomes
•146 bp
• H2A, H2B, H3, H4
Chicken and egg scenario
TF binding requires chromatin decompaction by certain factors but the latter also need to interact with DNA
Solution: probably some TFs can bind to their recognition sequences even when they are
packaged (e.g. glucocorticoid receptor: only
contacts DNA on one side NF1 surrounds
double helix)
1. ATP-dependent remodeling
Modify chromatin structure
2. Histone-modifying complexes
• Phosphorylation, methylation, acetylation
• Histone acetyltransferase (HAT), histone deacetylase (HDAC)
• How do they impact the structure of the template and the ability of the transcription machinery to function?
• lowered positive charge on acetylated N termini, lowered stability of interaction with DNA
• Disrupting internucleosomal interactions
• Recruiting additional TFs
• A lot of combinatorial possibilities: histon code?
Model of the protein
interactions and functions of the Myc/Max/Mad
transcription network.
Myc-Max and Mad-Max (along with Mnt-Max and Mga-Max) complexes bind to DNA to E-boxes. Binding can be affected by the context, sequence, cooperativity, and location of the E-boxes. Myc-Max heterodimers activate
transcription by recruiting HAT's via TRRAP. This leads to the acetylation of histone tails and the opening of local chromatin structure. Additionally, Myc-Max appears to repress transcription through Inr elements via an undefined mechanism. As a result of these activities at target genes, Myc affects proliferation, cell cycle, growth, immortalization, and apoptosis. When deregulated, Myc cooperates with other oncogenes to cause a variety of cancers.
Grandori C, Cowley SM, James LP, Eisenman RN.
Annu Rev Cell Dev Biol. 2000;16:653- 99