• No results found

Large scale analysis of ChIP-seq data indicates that KRAB zinc-finger proteins have gene regulatory properties

N/A
N/A
Protected

Academic year: 2021

Share "Large scale analysis of ChIP-seq data indicates that KRAB zinc-finger proteins have gene regulatory properties"

Copied!
19
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Large scale analysis of ChIP-seq data indicates that KRAB zinc-finger

proteins have gene regulatory properties

Stijn F. Robben

Supervisors: Gerrald Lodewijk & Frank Jacobs

Throughout evolution, genomes have been protected from evolving transposable elements by co-evolving KRAB-zinc finger proteins (KZFPs), which can repress further insertions of these self-copying and pasting genetic elements. Recent findings imply that zinc finger 675 (ZNF675) can also bind on gene promoters and affect transcription of the corresponding gene. It is unknown if any other KZFPs show such characteristics. In this thesis it was examined if the transcription of genes can be affected by KZFPs. A large set of ChIP-seq data, revealing possible KZFPs binding sites, was analyzed to investigate this matter. The expression of target genes was correlated with expression of the corresponding KZFP, to see if any patterns might emerge. The results imply that KZFPs show gene regulatory characteristics, with similar dy-namics as more well-studied transcription factors. Further preliminary analysis revealed that KZFPs may show forms of co-regulation. These findings create a foundation for future wet-lab experiments and increases the overall knowledge of KZFP functioning.

Introduction

The understanding of transcription factors (TFs) has greatly increased in the past decades. TFs are proteins that bind to DNA, most commonly near gene transcription start sites, but also other areas (i.e. enhancers). The binding of TFs regulates gene transcription and is highly cell-type spe-cific (Mitchell & Tjian, 1989). A sub-family of the TFs are the zinc finger domain containing TFs that are characterized by having one or more zinc ions in order to stabilize cer-tain folds (Brown, 2005). In addition to the zinc finger do-main, around 350 genes in the human genome encode for Krüppel-associated domain containing zinc-finger proteins (KZFPs). These KZFPs discern themselves by their ability to repress transposable element activity, preventing these self-amplifying genetic elements to further spread throughout the genome. If the amplification of transposable elements is not repressed, they can insert near or into genes, possibly a ffect-ing crucial parts of their genetic sequence or their overall regulation (Cordaux & Batzer, 2009).

The importance of these KZFPs was further verified when Jacobs et al. (2014) uncovered the evolutionary aspects of these proteins. In his studies it was shown that two primate-specific KZFPs rapidly evolved to repress two types of trans-posable element families shortly after they spread through-out the primate genome. These findings strongly support the hypothesis that KZFPs have played an important role in the evolution of the modern human being.

In addition, several studies uncovered binding motifs of almost 200 KZFPs. These binding motifs did not solely be-long to transposable elements, but also to several other sim-ilar regulatory sequences (Najafabadi et al., 2015;

Schmit-ges et al., 2016; Imbeault et al., 2017). This implies that KZFPs might also bind to sequences not being part of trans-posable elements, ultimately interacting with their function-ing as well.

Recent unpublished results exposed that ZNF675 can bind on promoter regions of genes that are involved in brain de-velopment and regulate their transcription. The structural changes of ZNF675, that made it gain these properties, was driven by transposable elements in primate evolution. This would imply that over time, zinc fingers gained the impor-tant secondary property of regulating gene transcription due to the evolving transposable elements.

It is unknown if more KZFPs show similar gene regulatory properties such as ZNF675. In this thesis, ChIP-sequencing data was analyzed to investigate binding patterns of KZFPs and if other KZFPs also show gene regulatory characteristics. It is hypothesized that indeed more KZFPs have adopted a role in gene regulation.

In order to investigate this matter, large scale processing and analysis of ChIP-sequencing data was performed to sys-tematically examine the binding patterns of KZFPs on gene promoters. Based on ChIP-seq data, a target gene list for each KZFP was made. Using Brainspan data (Miller et al., 2014), the expression level of a specific KZFP was then cor-related to the expression of all genes and its target genes. The distributions of the correlation values were plotted and compared for both sets of genes. The same analysis was done using ENCODE ChIP-seq data (Consortium et al., 2012), for TFs which have a clear function in gene regulation. This data was used as a reference to compare the KZFP data to. In ad-dition to this investigation the binding patterns of the KZFPs were further analyzed to display the complexity of regulation

(2)

of KZFPs between each other. To verify the correlation data, two methods were chosen to check the outcomes. The first includes the same analysis with random assignment of target genes to a KZFP. The second by attributing the target genes of one KZFP to another KZFP (crossover analysis).

If zinc fingers affect the transcription of genes, it is expected that the expression-correlation distribution of the genes that have a KZFP binding site on their promoter re-gion does not follow the same distribution as the expression correlation distribution of all genes in the brain relative to that same KZFP. It is also anticipated that the expression cor-relation distributions of the KZFP target genes are to some extend similar to those of regular TFs target genes. If tar-get genes are specific for a certain zinc-finger, it is expected that the crossover analysis will result in a random correlation distribution of target genes.

Materials& Methods

[All scripts were developed using python (Rossum, 1995)] Validating MACS peaks. The ChIP-seq data used for this experiment was genereated by Schmitges et al. (2016) (NCBI GEO database accession number GSE76496), Im-beault et al. (2017) (NCBI GEO database accession num-ber GSE78099) and Najafabadi et al. (2015) (NCBI GEO database accession number GSE52523). To validate the MAC peaks data, the ChiP-seq samples were mapped us-ing online tools from www.usegalaxy.org (Wickham, 2009). The files were downloaded as fastq.gz files and were first converted to fastq followed by a conversion to fastqsanger. No paired-end reads were used and any adaptor and other illumine-specific sequences were cut from the read. The number of bases to keep from the start of the read was set at 50. The mapping to the reference genome hg19 was per-formed using the Bowtie2 (Langmead & Salzberg, 2012) with default settings. Any potential PCR duplicates were removed with the samtools RmDump (Li et al., 2009). To generate the BigWig file, the tool bamCoverage was used with a bin size in bases of 1 and with ignoring any missing data.

Analysis MACS-peak data. To distinguish the bind-ing sites of the zinc fbind-ingers from noise, a cut-off of 500 for the p-value score (−10 · log10(pvalue)) was determined

through visual observation of the MACS-peaks using the UCSC genome browser (Kent et al., 2002). The locations of the remaining peaks were compared with the locations of gene promoter regions which were acquired from the UCSC data hubs (Raney et al., 2014). These regions were ad-justed according to the Genomics Regions Enrichment of Annotations Tool (GREAT) standards, 1000bp downstream and 5000bp upstream from a gene's transcription start site (McLean et al., 2010). Whenever the location of a gene's promoter region matched with a peak's location, for at least

50%, the information of that peak together with it gene's pro-moter region and annotation were documented in a seper-ate excel file for each zinc finger. The gene 's RefSeq an-notation of GREAT was switched to gene symbols using biodbnet-abcc.ncifcrf.gov (Mudunuri et al., 2009). All of the above steps were computed in Match_final.py in the func-tions match_prom() and match_symbols(). Any further de-tails about the functioning of the scripts can be found in the commented are in the python file.

Binding sites graph. The graph which displays the number of bindingsites per zinc finger was made using a combination of HTML and d3.js. The framework of the visu-alization, including defining the svg, titles and backgrounds, was built in html. The images were produced with the lan-guage d3.js. The data and results were formatted, using python, SublimeText® and pycharm, to either .JSON or .csv format. The data concerning the primate specificity of KZFP was obtained from the studies of Schmitges et al. (2016) and Imbeault et al. (2017). Any further details of the scripts and their functioning are commented in the code itself.

Histograms. To obtain the data for the histograms of the zinc fingers, the function API_use(zn f _id, zn f s) in the python file api.py was created. This function downloads the expression correlation values of all the genes in the brain in relation to every zinc finger. This is done in steps of 2000 genes due to restrictions of www.brainspan.com (Sunkin et al., 2013). Subsequently, correlation_zn f () checks, for ev-ery zinc finger, the expression correlation values of its target genes with respect to that zinc finger and documents this. hist(ZNF) visualizes the distribution of the expression cor-relation values of the target genes and the distribution of the expression correlation values all genes in one plot, for every zinc finger. Matplotlib (Hunter, 2007) was used to generate the histograms, with the settings binssize= 100.0 and scaled = 1.0 for the bins to represent the chance of genes having a particular correlation value varying within range -1 to 1.

Histograms of the regular TFs were created in a similar fashion. To get the binding sites on the genome of the TFs data was used from the ENCODE database (Consortium et al., 2012). This file was downloaded in .narrowPeak format and edited with the functions match_prom_trans(tsc f ) and match_symbols_trand(tsc f ) in the Match_final.py file with the same steps as described in subsection Validating MACS peaks. For the correlation data and histogram creation, the same functions and parameters were used as for the zinc fin-ger expression correlation histograms.

The crossover histograms were produced with the crossover_hist(ZNF1, ZNF2) script. This function docu-ments the target genes of one zinc finger and plots the cor-relation of those genes relative to another zinc finger in a histogram.

Histograms portraying distributions of random genes were made using the rand_check_hist(ZNF, count) script. The

(3)

APA STYLE 3

function choses the same amount of random genes as the number of target genes a zinc finger has. The genes were chosen randomly using the numpy python library. The dis-triubtions of the corresponding correlation values of those genes were plotted together with the distribution of the cor-relation values of all genes in the brain relative to that same zinc finger.

Heatmaps. Data for the heatmaps was obtained through the Brainspan Gene Expression tool (Miller et al., 2014). The function data_maker(brainarea) converts the raw brainspan data from .csv format to a readable excel file, for each zinc finger with >50 target genes. For the overall expression heatmap, data_maker(brainarea) calculates the average ex-pression per time step of all brain areas. Time steps don’t exceed 30 year, due to inconsistency of data. The heatmap visualization was made with R (R Development Core Team, 2008) using the heatmap.2 function from the library gplots. The color-scale ’redgreen’ was used and the dendogram rep-resents the clustering of the data according to the zinc finger expression rows.

The heatmap displaying the expression correlation distri-butions of the target genes of every zinc finger was created using heatmap_exp_cor_dis() in the file API.py. This func-tion checks the correlafunc-tions of the target genes for every zinc finger. It assigns the segments of the heatmap values ac-cording to the number of genes that fall within the correla-tion range which is covered on the x-axis by that same seg-ment. For this heatmap the color-scale ’blackred’ was used. The bins ranging from the correlation values -1.0 to -0.8 and 0.8 to 1.0 were removed because no values fell within these ranges.

The heatmap displaying the expression correlation dis-tributions of the targets genes of the common TFs were generated with heatmap_tsc f _correlation() in the file API.py. This function works in a similar fashion as heatmap_exp_cor_dis(). More information about the exact differences and functioning is commented in the code itself.

Network. The data for the network graph was gener-ated by the network_graph() function. This function loops through the the files that were generated before by the scripts in the file Match_final.py. For every zinc finger, it checks if it has a bindings site on a promoter region of a fellow ana-lyzed zinc finger. If this was the case, the function documents the data of these zinc fingers in a node and edge file, which could be interpreted and visualized by Ghepi 0.9.1. (Bastian et al., 2009). In Ghepi, the cluster algorithm OpenOrd (Mar-tin et al., 2011) was used. Due to clearity of the graph, only binding sites with a MACS-peak value score of>1500 were visualized. The color and the width of the edge between the nodes represents the MACS-peak value, with higher values shown a darker color red and wider than lower values. An image of the complete network can be found in the supple-mentary data. The interactive map is added as a digital zip

file under the name interactions.ghepi.

Score table. To assign score to zinc fingers to indicate to which degree they might be of interest for future research, the attributes expression, primate-specific, number of target genes and important genes were used. For each attribute the zinc fingers were assigned to one of 4 classes. For expres-sion the clusters of the heatmap would suffice. Zinc fin-gers expressed highest throughout life were assigned class A (4 points), with classes B (3 points), C (2 points) and D (1 points) gradually having a lower expression value. Pri-mate specific zinc fingers would receive class A, non-priPri-mate specific zinc fingers would receive class D for this attribute. Zinc fingers with >1000 number of target genes would re-ceive class A for this attribute, zinc fingers with between 999 and 500 binding sites would receive class B, zinc fingers be-tween 499 and 100 would receive class C and zinc fingers with less than 100 target genes would receive class D. The score would the total of points for all the attributes. The ’im-portant gene’ column would receive check (x) if any of the target genes for that zinc finger are involved in well-known important function.

(4)

Results

To get a more general overview of the zinc fingers and their functioning, it was decided to investigate which zinc fingers have the most binding sites on promoter regions of genes. The parameters and definitions that were used to in-vestigate this was determined as described in the materials & methods section. In figure 1, the zinc fingers with >50 target gene binding sites are displayed. Other zinc fingers with a lower number of binding sites on promoter regions were left out due to inaccessibility of the visualization. The figure shows a rapid decline in number of binding sites per zinc finger and shows no clear difference between newer KZFPs (primate specific duplications), or ancient KZFPs. Non-primate specific zinc fingers are displayed with a lighter color blue. The data details of the binding sites, including promoter region location, peak height and peak location can be found in the supplementary data.

Next, the expression of each zinc finger with>50 bind-ing sites was analyzed to examine which zinc fbind-ingers are expressed the most throughout the brain and at what age. The data from figure 2 was obtained from brainspan.com and shows the average value of the zinc finger expression per age in RPKM over different brainareas. Different clusters seem to appear, zinc fingers that are expressed high through-out life, zinc fingers with a higher expression prenatal and zinc fingers with a low expression throughout life. Supris-ingly, in this set there are no KZFPs with low expression in development, and high expression after birth. These clusters were established with the hierarchical cluster method using the Rstudio heatmap.2 function. A schematic overview can be seen in the dendogram at the most left part of the heatmap. The color-key with the colors and their corresponding values

in read per kilobase million is displayed on the upper-left part of the image. This procedure was also done for the distinct brain areas Ventromedial Prefrontal Cortex (VPC) and the Dorsolateral Prefrontal Cortex (DLPFC) because data was most consistent in these area's. These heatmaps can be found in the supplementary data. Any further analyses were now only performed on this set of KZFP due to the high amount of target genes needed for the analysis to remain valid.

With the information of these visualizations, it was de-cided to examine the expression data of the genes with a zinc finger binding site on their promoter region to see if a certain correlation pattern could be found. Therefore, the expres-sion data of the target genes for every zinc finger with>50 binding sites was extracted from brainspan. The distribution of the expression correlation values (between -1 and 1) was plotted in a histogram together with the expression correla-tion distribucorrela-tion of all genes in the brain relative to the same zinc finger.

Examples of these histograms can be observed on the right in figure 3. The heatmap in this figure serves as a schematic overview of all the different target gene correlation distribu-tions. Each square represents the percentage of genes that is distributed per bin. In combination with the same cluster algorithm as figure 2, this image can show clusters of zinc fingers with the same sort of target gene distributions more easily. High percentages are shown in red and low percent-ages in black as is shown in the color-key on the top left of the image. In the histograms, each representing the distributions of the different clusters of the heatmap, the expression corre-lation values of all genes relative to the KZFP are displayed in red. The expression correlation of the genes with a pro-moter region on which a KZFP binds is displayed blue. The distribution of the expression correlation for all genes often

Figure 1. Number of binding sites on promoter regions per zinc finger.

(5)

APA STYLE 5

Figure 2. Heatmap from brainspan expression data for every zinc finger with>50 binding sites on promoter regions. Low expression is displayed green, high expression in red. Horizontal order of zinc fingers was determined through a cluster algorithm which orders the zinc fingers according to similar expression patterns. This clustering is represented by the dendo-gram on the left of the image. Ages are displayed on the x-axis and the zinc fingers on the y-axis. Expression values were calculated by counting all nucleotides from reads that overlap with a given annotation entry. Subsequently, this value was normalized per million mapped nucleotides and the length of the item per kilobase (Miller et al., 2014).

show peaks around correlation value 0. The expression cor-relation distribution of the genes with a binding site of KZFP on their promoter can be uniform, skewed toward positive of negative, or follow the distribution of all genes, dependent to which cluster the KZFP belongs. Histograms of the other zinc fingers with>50 binding sites on promoter regions are to be found in the supplementary data. Zinc fingers with<50 binding sites on gene promoter regions were not plotted be-cause these low numbers allow trivial distributions to appear. To investigate if target genes are specific for KZFPs, a cross-over analysis was performed, in which the target genes of one KZFP were plotted relative to another KZFP, as can be seen in figure 4. Figure 4a shows the correlation distri-butions of ZNF519 target genes and of all genes in the brain

relative to ZNF519. Figure 4b shows the correlation dis-tributions of random assigned target genes and all genes in the brain relative to ZNF519. Here, both distributions almost perfectly align. Figure 4c shows the correlation distribution of target genes from ZNF202 relative to ZNF519, exhibiting almost no alignment of the 2 distributions. This may indicate that KZFPs do not act independently.

The same analyses were done on ENCODE ChIP-seq data of regular, more well-known, transcription factors in order to use the TF results as a reference for those of the KZFPs. Again, a heatmap was created from the correlation distribu-tions of the target genes of the TFs (figure 5). A represen-tative example histogram of each cluster is displayed on the right. Similar distribution patters for the target genes can be

(6)

Figure 3. Heatmap of the brainspan correlation data for every zinc finger and its target genes.

Each square represents the percentage of target genes that have a correlation covered on the x-axis by that square. As can be seen from the color-key in the top left of the figure, high percentages are displayed in red, low percentages in black. The range of correlation (-1, 1) was divided into 10 bins to make the visualization the most comprehensive. Components with 0% are shown blank. The borders of the bins are displayed on the x-axis, the different zinc finger along the y-axis. The same cluster algorithm was used to produce previous heatmaps. Again, the clustering is shown in the dendogram at the left of the figure. The outer correlation regions (-1.0 to -0.8 and 0.8 to 1.0) were left out because none of the zinc fingers had any data within these regions. A representative example of each cluster is displayed on the right.

(7)

APA STYLE 7

(a) (b)

(c)

Figure 4. Distributions of expression correlation values of KZFPs.

In the histograms, each bin represents the fraction of genes with a correlation value that is covered by the bin on the x-axis. Correlation values were obtained from brainspan, which performed a Pearson’s correlation test on the expression data of each gene. For each graph, the distribution of target genes are displayed in blue, and the distribution of all genes are displayed in red. a, Distributions of expression correlation values of all genes and target genes in the brain relative to ZNF519. b, Distributions of expression correlation values of all genes and random target genes relative to ZNF519. c, Distributions of expression correlation values of the cross-over of ZNF202 target genes relative to ZNF519.

observed in every cluster as in the heatmap of figure 3. The same parameters were used for this histograms as were used to create the histogram in figure 3.

Also a similar cross-over analysis was performed on these TFs as can be observed in figure 6. Figure 6a shows the cor-relation distributions of SIN3A target genes and of and all genes in the brain relative to SIN3A. Figure 6b shows the correlation distributions of random assigned target genes and all genes in the brain relative to SIN3A. Here, both distribu-tions almost perfectly align. Figure 6c shows the correlation

distribution of target genes from TCF12 relative to SIN3A, exhibiting almost no alignment of the 2 distributions.

(8)

Figure 5. Heatmap of the brainspan correlation data for every TF and its target genes.

Each square represents of the percentage of target genes that have a correlation covered on the x-axis by that square. As can be seen from the color-key in the top left of the figure, high percentages are displayed in red, low percentages in black. The range of correlation (-1, 1) was divided into 10 bins to make the visualization the most comprehensive. Components with 0% are shown blank. The borders of the bins are displayed on the x-axis, the different zinc finger along the y-axis. The same cluster algorithm was used to produce previous heatmaps. Again, the clustering is shown in the dendogram at the left of the figure. The outer correlation regions (-1.0 to -0.8 and 0.8 to 1.0) were left out because none of the zinc fingers had any data within these regions. A representative example of each cluster is displayed on the right.

(9)

APA STYLE 9

(a) (b)

(c)

Figure 6. Distributions of expression correlation values of common TFs

In the histograms, each bin represents the fraction of genes with a correlation value that is covered by the bin on the x-axis. Correlation values were obtained from brainspan, which performed a Pearson’s correlation test on the expression data of each gene. For each graph, the distribution of target genes are displayed in blue, and the distribution of all genes are displayed in red. a, Distributions of expression correlation values of all genes and target genes in the brain relative to SIN3A. b, Distributions of expression correlation values of all genes and random target genes relative to SIN3A. c, Distributions of expression correlation values of the cross-over of TCF12 target genes relative to SIN3A.

Discussion

Our analyses indicate that most zinc fingers show gene regulatory characteristics. Genes with a zinc finger binding site on their promoter region have generally different expres-sion pattern throughout life relative to that zinc finger com-pared to other genes in the brain. In some cases, the corre-lation distributions are skewed more towards the negative or the positive side which could be a result of the zinc finger having a repressing or activating effect. Since the KRAB domain is known for inducing a repressive epigenetic state,

it is interesting to see that for some KZFPs the correlations are skewed to the positive side. These results are in line with the results the of ENCODE transcription factors. The same changes in correlation distributions for the transcription fac-tors can be observed as for the KZFPs. This suggests that KZFPs have a similar effect on their target genes as the more common transcription factors, reinforcing the hypothesis that zinc fingers can regulate gene transcription. The additional cross-over results indicated that target genes are not specific for a KZFP as the correlation distributions did not follow the

(10)

distribution of random target genes.

The cross-over analysis indicates that KZFPs may not act completely independent of each other or other factors. We have several ideas about these results. The first explanation could be that the target genes of the different KZFPs are all co-regulated by another common TF. An example of this for ZNF202 and ZNF519 is given in figure 7. If the target genes of one KZFP are influenced by the same TF as the target genes of another KZFP, both sets of targets genes would en-counter the same influences in transcription throughout life. This could explain the similar sort of correlation distributions of different sets of target genes for the same KZFP.

Figure 7. Different sets of target genes are influenced by the same TF.

Target genes of different KZFP might be part of a larger set of genes influenced by the same TF

The second explanation focuses more on a common TF for the KZFPs (figure 8). If different KZFPs are influenced themselves by a common TF, this TF could indirectly in-fluence the transcription of the target genes of the different KZFPs. This could possibly also result in the same sort of correlation distribution for both sets of target genes.

Figure 8. Different KZFP are influenced by the same TF. Different KZFP are influenced by the same TF, indirectly in-fluencing the different sets of target genes.

Figure 9. Overlap between target genes of different KZFPs Some target genes might be influenced by both KZFPs.

A third explanation could be that there is an evolutionary relationship between certain KZFPs which might result in a similar binding site of the KZFPs, as is visualized in figure 9. A similar binding site would mean that different KZFPs bind on the same promoter regions, possibly affecting the same genes. This would be the most powerful explanation as the cross-over results could be explained that the target genes of one KZFP are the same as for the other KZFP. This is how-ever is not very likely as only ± 10% of the genes overlap between most KZFPs.

The final explanation includes the effect of KZFPs might have on their own transcription. In figure 5 of the supple-mental data, a graph is shown which represents the bind-ing sites of KZFPs on promoter regions of other analyzed KZFPs. This graph shows different clusters of KZFPs hav-ing each other as target genes. This connectivity between KZFPs could greatly complexify the expression of KZFPs and their corresponding target genes, which could eventually result in the surpising cross-over results. It is also possible that a cause of the cross-over results could be a combination of all explanations above to some degree.

Results in these analyses can be influences by the arbi-trary chosen parameters for both the promoter regions and the MACS-peak cut off. Promoter regions were determined as 5000kb upstream and 1000kb downstream from the tran-scription start site of the gene. This rule does not apply to all genes, as it is known that the locations of gene promot-ers can be far more complex (McLean et al., 2010). The MACS-peak cut off was set at a MACS-peak value score of 500. This value was chosen because the different samples of ChIP-seq data showed different gradations of noise, which made it necessary to chose a high value for the peaks to make sure to filter out all noise peaks in all samples. These deci-sions could have resulted in either including trivial data or excluding relevant data.

Recent data indicated that ZNF675 influences the tran-scription of genes with a ZNF675 binding site on their

(11)

pro-APA STYLE 11

moter region. The results of this research are in line with these findings, showing that the expression throughout life of zinc finger target genes is indeed different from other genes. Because this research revealed many possible zinc finger target genes through dry-lab analysis, any future research can focus on validating these results in a wet-lab manner. By val-idating some or many of the effects of zinc fingers on their target genes, the results of this dry-lab analysis can be rein-forced. Another suggestion for future research includes per-forming the same analysis with better parameters for the pro-moter regions and MACS-peak cut offs to get more specific and demarcated data.

Concluding, these results imply that zinc fingers can affect gene transcription when bound on a gene promoter region. However, cross-over results indicated that more complex is-sues are involved in the role of KZFPs and their influence in gene transcription.

(12)

References

Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for explor-ing and manipulating networks. Retrieved from http://www.aaai.org/ocs/index.php/paper/view/154 Brown, R. S. (2005). Zinc finger proteins: getting a grip on rna.

Current opinion in structural biology, 15(1), 94–98.

Consortium, E. P., et al. (2012). An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414), 57–74. Cordaux, R., & Batzer, M. A. (2009). The impact of

retrotrans-posons on human genome evolution. Nature Reviews Genetics, 10(10), 691–703.

Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Com-puting In Science& Engineering, 9(3), 90–95.

Imbeault, M., Helleboid, P.-Y., & Trono, D. (2017). Krab zinc-finger proteins contribute to the evolution of gene regulatory net-works. Nature, 543(7646), 550–554.

Jacobs, F. M., Greenberg, D., Nguyen, N., Haeussler, M., Ewing, A. D., Katzman, S., et al. (2014). An evolutionary arms race between krab zinc-finger genes znf91/93 and sva/l1 retrotrans-posons. Nature, 516(7530), 242–245.

Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., & Haussler, D. (2002). The human genome browser at ucsc. Genome research, 12(6), 996–1006.

Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read align-ment with bowtie 2. Nature methods, 9(4), 357.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., et al. (2009). The sequence alignment/map format and sam-tools. Bioinformatics, 25(16), 2078–2079.

Martin, S., Brown, W. M., Klavans, R., & Boyack, K. W. (2011). Openord: an open-source toolbox for large graph layout. In Is&t/spie electronic imaging (pp. 786806–786806).

McLean, C. Y., Bristor, D., Hiller, M., Clarke, S. L., Schaar, B. T., Lowe, C. B., et al. (2010). Great improves functional inter-pretation of cis-regulatory regions. Nature biotechnology, 28(5), 495–501.

Miller, J. A., Ding, S.-L., Sunkin, S. M., Smith, K. A., Ng, L., Szafer, A., . . . others (2014). Transcriptional landscape of the prenatal human brain. Nature, 508(7495), 199.

Mitchell, P. J., & Tjian, R. (1989, July). Transcriptional Regu-lation in Mammalian Cells by Sequence-Specific DNA Binding Proteins. Science, 245, 371-378. doi: 10.1126/science.2667136 Mudunuri, U., Che, A., Yi, M., & Stephens, R. M. (2009). biodbnet:

the biological database network. Bioinformatics, 25(4), 555– 556.

Najafabadi, H. S., Mnaimneh, S., Schmitges, F. W., Garton, M., Lam, K. N., Yang, A., . . . others (2015). C2h2 zinc finger proteins greatly expand the human regulatory lexicon. Nature biotechnology, 33(5), 555–562.

R Development Core Team. (2008). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org (ISBN 3-900051-07-0)

Raney, B. J., Dreszer, T. R., Barber, G. P., Clawson, H., Fujita, P. A., Wang, T., et al. (2014). Track data hubs enable visu-alization of user-defined genome-wide annotations on the ucsc genome browser. Bioinformatics, 30(7), 1003–1005.

Rossum, G. (1995). Python reference manual (Tech. Rep.). Ams-terdam, The Netherlands, The Netherlands.

Schmitges, F. W., Radovani, E., Najafabadi, H. S., Barazandeh, M., Campitelli, L. F., Yin, Y., et al. (2016). Multiparameter func-tional diversity of human c2h2 zinc finger proteins. Genome Re-search, 26(12), 1742–1752.

Sunkin, S. M., Ng, L., Lau, C., Dolbeare, T., Gilbert, T. L., Thomp-son, C. L., et al. (2013). Allen brain atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic acids research, 41(D1), D996–D1008.

Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York. Retrieved from http://had.co.nz/ggplot2/book

(13)

APA STYLE 13

Supplemental data

Expression Primate specific Number of target genes Important genes Total score

znf263 A D A x 9 znf282 A D A x 9 znf429 B A C . 9 znf44 B A C x 9 znf441 C A B x 9 znf506 B A C . 9 znf519 D A A x 9 znf534 D A A x 9 znf610 B A C . 9 znf675 A A D . 9 znf28 C A C x 8 znf468 C A C x 8 znf611 C A C . 8 znf680 B A D . 8 znf141 D A C . 7 znf202 C D A x 7 znf273 D A C . 7 znf284 D A C . 7 znf320 C A D . 7 znf343 D A C . 7 znf436 A D C . 7 znf490 C A D . 7 znf543 C A D . 7 znf549 C A D . 7 znf695 D A C . 7 znf736 D A C . 7 znf135 A D D . 6 znf182 B D C x 6 znf189 A D D . 6 znf257 D A D x 6 znf266 A D D . 6 znf445 A D D . 6 znf554 B D C . 6 znf563 D A D . 6 znf671 A D D . 6 znf765 D A D . 6 znf808 D A D x 6 znf823 D A D . 6 znf324 B D D . 5 znf479 D D B x 5 znf783 D D B . 5 znf786 C D C x 5 znf793 C D C x 5 znf180 C D D . 4 znf425 D D C . 4 znf485 D D C . 4 znf674 D D C . 4 znf778 C D D . 4 znf283 D D D . 3 znf317 D D D . 3 znf75D D D D x 3 znf84 D D D . 3 Table 1

Importance potential score table of all zinc fingers Every zinc finger was assigned a score for every attribute and the degree is has this attribute. A= 4 points, B = 3 points, C = 2 points, D = 1 point. The ’important gene’ column shows the if any target genes of that zinc finger are involved in important pathways.

(14)

(a) znf28 (b) znf44 (c) znf75D (d) znf84

(e) znf135 (f) znf141 (g) znf180 (h) znf182

(i) znf189 (j) znf202 (k) znf257 (l) znf263

(m) znf266 (n) znf273 (o) znf282 (p) znf283

(15)

APA STYLE 15

(u) znf343 (v) znf425 (w) znf429 (x) znf436

(y) znf441 (z) znf445 (aa) znf468 (ab) znf479

(ac) znf485 (ad) znf490 (ae) znf506 (af) znf519

(ag) znf534 (ah) znf543 (ai) znf549 (aj) znf554

(ak) znf563 (al) znf610 (am) znf611 (an) znf671

(16)

(as) znf736 (at) znf765 (au) znf783 (av) znf786

(aw) znf793 (ax) znf808 (ay) znf823

Figure 1. Hisograms of all zinc fingers.

The distribution of the expression correlation values of all the genes in the brain relative to the zinc finger is displayed in red. The distribution of expression correlation values of the target genes relative to the zinc finger are displayed in blue.

(a) BHLHE40 (b) CTCF (c) EP300 (d) EZH2

(e) FOXA1 (f) POLR2A (g) RAD21 (h) SIN3A

(i) SIX5 (j) SP1 (k) TBP (l) TCF12

Figure 2. Hisograms of all transcription factors.

The distribution of the expression correlation values of all the genes in the brain relative to the TFs is displayed in red. The distribution of expression correlation values of the target genes relative to the TFs are displayed in blue.

(17)

APA STYLE 17

Figure 3. Heatmap from brainspan VFC expression data for every zinc finger with>50 binding sites on promoter regions. Low expression is displayed green, high expression in red. Horizontal order of zinc fingers was determined through a cluster algorithm which orders the zinc fingers according to similar expression patterns. This clustering is represented by the dendo-gram on the left of the image. Ages are displayed on the x-axis and the zinc fingers on the y-axis. Expression values were calculated by counting all nucleotides from reads that overlap with a given annotation entry. Subsequently, this value was normalized per million mapped nucleotides and the length of the item per kilobase (Miller et al., 2014).

(18)

Figure 4. Heatmap from brainspan DFC expression data for every zinc finger with>50 binding sites on promoter regions. Low expression is displayed green, high expression in red. Horizontal order of zinc fingers was determined through a cluster algorithm which orders the zinc fingers according to similar expression patterns. This clustering is represented by the dendo-gram on the left of the image. Ages are displayed on the x-axis and the zinc fingers on the y-axis. Expression values were calculated by counting all nucleotides from reads that overlap with a given annotation entry. Subsequently, this value was normalized per million mapped nucleotides and the length of the item per kilobase (Miller et al., 2014).

(19)

APA STYLE 19

Figure 5. Network graph of zinc fingers binding on promoter regions of other analyzed zinc fingers.

Network of all interacting zinc finger with other analyzed zinc fingers. The higher MACS-peak value is indicated with a thicker and darker edge. Nodes and edges were structured with the cluster algorithm OpenOrd in Ghepi 0.9.1.

Referenties

GERELATEERDE DOCUMENTEN

Lasse Lindekilde, Stefan Malthaner, and Francis O’Connor, “Embedded and Peripheral: Rela- tional Patterns of Lone Actor Radicalization” (Forthcoming); Stefan Malthaner et al.,

It implies that for a given country, an increase in income redistribution of 1 per cent across time is associated with an on average 0.01 per cent annual lower economic growth

It is likely that in this promoter context, the zinc finger proteins repress gene expression via binding to the BA and/or VH fragments.. This is confirmed by an experiment in which

Suppose that we consider a set of microarray experiments that contains expression levels for N genes gi (we call this set of genes Ň – so N = # Ň) measured under several

Abstract In the last decade, solar geoengineering (solar radiation management, or SRM) has received increasing consideration as a potential means to reduce risks of

The present text seems strongly to indicate the territorial restoration of the nation (cf. It will be greatly enlarged and permanently settled. However, we must

The Messianic Kingdom will come about in all three dimensions, viz., the spiritual (religious), the political, and the natural. Considering the natural aspect, we

Muslims are less frequent users of contraception and the report reiterates what researchers and activists have known for a long time: there exists a longstanding suspicion of