• No results found

Analysis of the mRNA-lncRNA-miRNA network in clear cell renal cell carcinoma

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of the mRNA-lncRNA-miRNA network in clear cell renal cell carcinoma"

Copied!
32
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Analysis of the mRNA-lncRNA-miRNA network in clear cell renal cell

carcinoma

Author: Niek van der Zee (e-mail: niekvanderzee2712@gmail.com, student ID: 11642866), supervisor: dr. Marten Postma (e-mail: m.postma@uva.nl), second assessor: prof. dr. Eric Eldering (e-mail: e.eldering@amsterdamumc.nl).

Abstract

Clear cell renal cell carcinoma (CCRCC) is the most prevalent type of renal cell carcinoma and has the worst prognosis. Currently, the main treatment is removal of the tumour through surgery, which is particularly difficult in more progressed tumours. In search of alternative approaches, a better understanding of molecular mechanisms in CCRCC tumour formation is paramount to the development of more targeted therapies. One interesting and promising class of molecules are the so-called regulatory RNAs. In particular, long non-coding RNAs (lncRNAs) that interact with micro RNAs (miRNAs), which in turn are involved in post-transcriptional regulation of messenger RNAs (mRNAs). Long non-coding RNAs (lncRNAs) interact with micro RNAs (miRNAs), which are involved in the post-transcriptional regulation of mRNAs. Perturbation of these mRNA-lncRNA-miRNA interactions could increase or reduce the expression of oncogenes and tumour suppressor genes, respectively, contributing to the formation of cancer. Therefore, this study focusses on the identification of the key mRNAs, lncRNAs and miRNAs in CCRCC. High-throughput mRNA, lncRNA and miRNA sequencing data of paired CCRCCs and adjacent normal tissues, obtained from The Cancer Genome Atlas (TCGA), were analysed using R. Paired differential expression analysis yielded 4861 mRNA, 2131 lncRNA and 140 precursor miRNA candidates in CCRCC. miRNA-mRNA network analysis suggested that 755 mRNAs appear to be upregulated linked to 91 downregulated miRNAs, thereby evading miRNA-directed degradation. Conversely, 934 mRNAs appear to be downregulated in relation to miRNA-directed degradation by 56 upregulated miRNAs. KEGG and GO pathway analysis suggested involvement in angiogenesis/hypoxia response, by absence of degradation of VEGF and TGFβ directed by among other the micro-RNA-200 family. In summary, RNAs may play an important and interesting role in the development and progression of clear cell renal cell carcinomas. This analysis yielded putative molecular targets; however more extensive research is needed for the development of treatments based on RNA interactions.

Introduction

Renal cell carcinoma (RCC) is a renal cancer type and accounts for 90% of all kidney cancer cases (The American Cancer Society, 2019). The progression of RCC tumour formation is often asymptomatic. Therefore, most reported cases are detected incidentally during imaging tests such as X-ray, CT, or MRI for unrelated clinical indications. The incidental cases of RCC have a better prognosis than non-incidental cases because tumours of non-non-incidental cases, which in general have progressed more, this is also known as the lead-time bias (Al-Marhoon et al., 2011). Interestingly, RCC is slightly more prevalent among men, as 60% of RCC cases are men (American Cancer Society, 2019). The three most prevalent subtypes of RCC are clear cell renal cell carcinomas (CCRCC), papillary renal cell carcinomas (PRCC) and chromophobe renal cell carcinomas (ChRCC), of which CCRCC accounts for 70-90% of all RCCs and has the worst prognosis with a 5-year survival rate of approximately 70% (Cheville et al., 2003). CCRCC derives from the lining of the proximal tubule of the nephron and is characterized by its histologic appearance, the tumour cells have a clear cytoplasm caused by an accumulation of glycogen and lipids (Motzer et al., 1996; Gebhard et al., 1987). CCRCC tumours can be divided into four different stages according to their size and their migration within or surrounding the kidneys (Figure 1). The stage of the tumour has a significant effect on the prognosis for the patient. The 5-year survival rates

(2)

2 for localized, regional, and distant tumours are 93%, 70%, and 12%, respectively (The American Cancer Society, 2020). The common treatment for CCRCC is removal of the tumour or the whole kidney by surgery. However, stage IV patients can only be candidates for surgery if they show minimal regional adenopathy, i.e. disease of the lymph. In addition, most patients with metastases to the lymph nodes will relapse after surgery, despite of the removal of lymph nodes, highlighting the need for alternative treatments (Motzer et al.,2009). Alternative treatment consists of systemic therapies like immune- or targeted therapies. The standard first-line systemic therapy for RCC is the VEGF inhibitor sunitinib (Schmid et al., 2016). Recently, an immune therapy consisting of a combination of nivolumab (a PD-L1 inhibitor) and ipilimumab (an antibody for CTLA-4) showed a 30-month overall survival of 60% compared to 47% for sunitinib (Motzer et al., 2019). For development of more systemic therapies in CCRCC, a better understanding of the molecular mechanisms involved in the formation of CCRCC tumours is necessary.

Figure 1. The staging of renal cell carcinomas. For the National Cancer Institute © 2018 Terese Winslow LLC, U.S. Govt. has certain rights1.

The molecular mechanisms in cancer development are often the result of activation/upregulation of oncogenes or deactivation/downregulation of tumour suppressor genes by specific DNA mutations (Koeffler et al.,1991). For example, Cancer Genome Atlas Research Network (2013) reported that mutated genes in CCRCC were involved in the cellular oxygen sensing VHL/HIF pathway, the chromatin remodelling/histone methylation pathway, and the PI3K/AKT pathway. They further stated that the PI3K/AKT pathway presented a strong candidate for therapeutic targets in CCRCC. Interestingly, similar to proteins, regulatory RNAs can act as oncogenic or tumour suppressive (Acunzo et al., 2015). Micro RNAs (miRNAs) are non-protein coding and have a length of approximately

(3)

3 21-25 nucleotides (Ambros et al., 2003). Most miRNAs are produced by splicing of precursor miRNA hairpin-loops, resulting in a mature 3’ and a 5’ miRNA. miRNAs are for the most part involved in downregulating messenger RNA (mRNA) translation by targeting mRNAs for destruction (Ha, et al., 2014). Therefore, increased, or decreased expression of miRNAs could lead to the dysregulation of oncogenic pathways, contributing to the formation and progression of tumours. For example, Pan et al. (2018) suggested that miR-193a-3p and miR-224 play an important role in regulation of RCC by targeting ST3GalIV via the PI3K/Akt pathway. Long non-coding RNAs are RNAs with a length of more than 200 nucleotides and have several interactions with miRNAs. LncRNAs can act as decoys for miRNAs reducing the regulatory effect of miRNAs on mRNAs, they can be degraded by miRNAs, they can compete with miRNAs for the binding of mRNAs and the splicing of lncRNAs could produce mature miRNAs (Yoon et al., 2014). The interactions between mRNAs, lncRNAs and miRNAs may form an interesting and complex network of post-transcriptional regulation (Figure 2). Network analysis based on RNAseq of lncRNA, miRNA and mRNA has been applied to several cancer types, including breast cancer (Xiao et al., 2018), colon cancer (Huang et al., 2019) and prostate cancer (He et al., 2018). Xiao et al. (2018) and Huang et al. (2019) used the network analysis to gain insights into the mRNAs, lncRNAs and miRNAs associated with breast cancer and colon cancer prognosis, respectively. Whereas He et al. (2018) performed a network analysis to gain insights into RNAs and pathways that may be involved in the pathogenesis prostate cancer.

Figure 2. The interactions between mRNAs, lncRNA and miRNAs as reported by Ha, et al. (2014) & Yoon et al. (2014). Created with BioRender.com.

Mimicking RNA interactions has led to the development of the first RNA interference drug approved by the US Food and Drug Administration called patisiran (Setten et al., 2019). Patisiran targets a sequence within the transthyretin (TTR) messenger RNA thereby reducing the production of misfolded TTR in hereditary transthyretin-mediated amyloidosis. In theory, RNA interference could be implemented into the mRNA-lncRNA-miRNA network reducing the production of oncogenes or

(4)

4 indirectly stimulate the production of tumour suppressors. Therefore, a better understanding on the mechanisms of mRNAs, lncRNAs and miRNAs in the carcinogenesis of CRCC is necessary for the development of new targeted therapies for CCRCC.

Several studies on the interactions between mRNAs, lncRNAs and miRNAs in CCRCC have already been carried out. The Cancer Genome Atlas Research Network (2013) reported that CCRCC tumours could be divided into four subtypes according to mRNA and miRNA expression patterns. They further stated that the characteristic marker for the worst prognosis subtype of CCRCC was the miRNA miR-21, which is involved in PI3K signalling by downregulating PTEN. Liu et al. (2010) reported that a loss of miR-149, miR-200c and mir-141 causes gain of function of oncogenes (KCNMA1, LOX), VEGFA and SEMA6A, respectively; increased levels of miR-142-3p, miR-185, mir-34a, miR-224, miR-21 cause loss of function of tumour suppressors LRRC2, PTPN13, SFRP1, ERBB4, and (SLC12A1, TCF21) respectively. Zhu et al. (2018) reported four lncRNAs (ENTPD3-AS1, FGD5-AS1, LIFR-AS1, and UBAC2-AS1) as potential therapeutic targets as well as prognostic biomarkers of CCRCC. The current study will focus on the identification of candidate mRNAs, lncRNAs or miRNAs as therapeutic targets in CCRCC.

This will be accomplished by analysis of a large public dataset containing next generation RNA sequencing data for CCRCC patients using bioinformatics “R” packages (R Core Team, 2017) (Figure 3). First, the heterogeneity among the samples needs to be explored to ensure fair comparison. Therefore, a principal component analysis (PCA) is used to reduce the high dimensionality of the RNA count data, enabling visualization of the variability. Also, the similarity of samples is visualized using a hierarchical clustering based on RNA expression correlation patterns. Second, it is of interests what mRNAs, lncRNAs ,and miRNAs are expressed differentially in CCRCC tumours compared to normal tissue. For differential expression two mainstream R-packages are available EdgeR and DESeq2 (Robison et al.,2010; Love et al., 2014), in the study DESeq2 was used. Before differentially expression analysis, the RNA counts need to be normalized for library size and library composition, because RNA sequencing does not produce an equal total amount reads for every sample and a depletion of a gene automatically increases the reads for other genes. Third, the upregulated and downregulated miRNAs that target mRNAs for destruction will be retrieved for interaction network reconstruction. Finally, it is of interest what the functions are of the mRNAs targeted by miRNAs. This will be investigated by a KEGG (Kanehisa et al., 2000) and GO (Ashburner et al., 2000) pathway enrichment analysis using DAVID v.6.8 (Huang et al., 2009). It is expected that the results of this study will reveal putative mRNA, lncRNA and miRNA candidates for therapeutic targets in CCRCC.

(5)

5 Figure 3. The data analysis flowchart for analysis of mRNA-lncRNA-miRNA network in clear cell renal cell carcinoma. (1), Differential expression analysis of RNA counts using “DESeq2”. (2), Retrieval of putative miRNA-mRNA pairs using “multimer”. (3), Validating miRNA-mRNA pairs by correlation analysis. (4), Focussing on the 10 upregulated and 10 downregulated miRNAs targeting the highest number of mRNAs. (5), KEGG and GO pathway enrichment analysis of the target mRNAs.

Methods & materials

RNA expression quantification- and meta data.

The Cancer Genome Atlas (TCGA, https://www.cancer.gov/tcga), hosted on The NCI Genomic Data Commons (GDC, https://portal.gdc.cancer.gov), provides bioinformatic data of primary cancer and matched normal samples for 33 cancer types. RNA high-throughput sequencing (HTseq) read counts and metadata for kidney renal clear cell carcinomas (KIRC, https://portal.gdc.cancer.gov/projects/TCGA-KIRC) and adjacent normal kidney tissues, taken 2cm away from the tumours, were retrieved using the R package “TCGAbiolinks” (Colaprico et al., 2015; Silva et al., 2016; Mounir et al., 2019; 10.18129/B9.bioc.TCGAbiolinks). The mRNA and lncRNA expression dataset contained paired normal-tumour tissue data for 72 patients, whereas the miRNA expression dataset contained paired normal-tumour tissue data for 71 patients. By a paired design of analysis, the patient-level confounder can be estimated out, thereby increasing the statistical power (Stevens et al., 2018). Note that the TCGA-KIRC miRNA expression dataset combines the reads of the 3p and 5p mature miRNAs and assigns them to their common precursor miRNA hairpin-loop IDs. Most mature miRNAs are produced from splicing of precursor miRNA hairpins-loops, resulting in two mature miRNAs: one from the 3’-hydroxyl-end (3p) and one from the 5’-phosphate-end. Later, mature miRNA IDs will be reconstructed.

(6)

6 Exploration and visualization of the high dimensional expression data samples.

The goal of differentially expression analysis is to retrieve the mRNAs, lncRNAs, and miRNAs that are characteristic for the difference between normal and CCRCC tumour tissue. To consider the possibility that other factors than cancerous versus tumour tissue type explain the differentially expressed RNAs, the heterogeneity of the samples needs to be checked. The heterogeneity of the samples in the TCGA-KIRC data sets was explored with a principal component analysis (PCA) using the R function “prcomp” on mRNA, lncRNA and miRNA HTseq counts. The HTseq counts were normalized by variance stabilizing transformation prior to PCA using the function “vst” from the R package “DESeq2” (Love et al., 2014;

10.18129/B9.bioc.DESeq2). This normalization method takes into account the differences in library size

among samples. To explore the similarity of the samples, Pearson correlations were calculated between all samples based on normalized RNA expression using the R function “cor” followed by a hierarchical clustering using the R function “hclust”. Finally, a clustered correlation matrix was plotted using the function “aheatmap” from the “NMF” package (Gaujoux et al., 2010; https://cran.r-project.org/web/packages/NMF).

Identification of differentially expressed mRNAs, miRNAs and lncRNAs in CCRCC.

The differentially expressed mRNAs, lncRNAs, and precursor miRNAs in kidney renal clear cell carcinomas were identified by paired analysis of tumour tissue and adjacent normal tissue using the R package “DESeq2” (Love et al., 2014, 10.18129/B9.bioc.DESeq2). A paired design detects RNAs that are differentially expressed in CCRCC compared to normal for each patient separately, adjusting for baseline differences between the patients (Chen et al., 2020). Only RNAs with a read count of 10 or more in at least 10 tissue samples were included because genes that express less are considered to have no effect and their deletion increased the speed of the differential expression calculation. The log2-foldchanges (LFC) were shrunken by the R package ”apeglm” to reduce the variance of RNAs with low read counts (Zhu et al., 2018; 10.18129/B9.bioc.apeglm). The cut-offs for the differentially expressed RNAs were defined as: LFC < -1 or > 1; Benjamini-Hochberg adjusted p-value (Benjamini & Hochberg, 1995) of p < 0.01. The differentially expressed mRNAs, lncRNAs and precursor miRNAs were visualised with a volcano plot (Li, 2012).

The expression of differentially expressed RNAs among samples was explored using RNA expression heatmaps. First, the sample columns and gene rows were hierarchical clustered based on their Pearson correlation coefficients using the R functions “cor” and “hclust”. Then, the vst normalized expression for all differentially expressed RNAs were plotted in a heatmap ordered by the hierarchical clustering using the function “aheatmap” from the “NMF” package (Gaujoux et al., 2010; https://cran.r-project.org/web/packages/NMF).

mRNA-miRNA network construction analysis.

Validated interactions between upregulated miRNAs and downregulated mRNAs and vice versa were retrieved from the following databases: tarbase v.8 Karagkouni et al., 2018), mirtarbase v.7.0 (Chou et al., 2018), mirdb v.6 (Chen et al., 2020; Liu et al., 2019) and targetscan v.7.2 (Agarwal et al., 2015) using the “multiMiR” package (Ru et al., 2014; 10.18129/B9.bioc.multiMiR). The TCGA-KIRC miRNA expression dataset combines the counts of mature miRNAs and assigns them to precursor miRNA hairpin-loop IDs, but mature miRNA IDs are necessary for the retrieval of miRNA-mRNA interactions. Therefore, mature miRNA IDs were reconstructed by simply removing the chromosomal site-specific part of the miRNA ID and adding “3-p” and “5-p” to the end of the miRNA IDs. Note that this assumes that mature miRNAs derived from the same precursor miRNA have a similar expression. To validate the putative miRNA-mRNA interactions, the correlation between miRNA and target mRNA expression was calculated among 142 samples, consisting of paired normal and tumour tissues derived from 71

(7)

7 patients, using the R function “cor”. Only miRNA-mRNA interactions with a Pearson correlation coefficient of -0.5 or less were kept, meaning that the expression of the miRNAs is negatively correlated with target mRNA expression among the samples.

In general, in networks analysis, nodes with the highest amount of edges are considered as most important (McKnight, 2014, p. 127). In the case of this study the nodes represent miRNAs or mRNAs and the edges the interactions between miRNA and mRNAs. Therefore, the miRNAs with the highest number of target mRNAs were assumed to be most important. The interactions of ten upregulated and ten downregulated miRNAs with the highest number of connections and their target mRNAs in CCRCC were visualised using cytoscape v.3.8.0 (Shannon et al., 2003). To check if some of these miRNAs co-regulate target mRNAs, i.e. they share a large fraction of targets, the amount of shared mRNA targets between miRNAs was divided by the average of the number of targets of those two intersecting miRNAs. These normalized fractions of similar mRNA targets were plotted in a heatmap using the function “aheatmap” from the “NMF” package (Gaujoux et al., 2010; https://cran.r-project.org/web/packages/NMF).

lncRNA-miRNA network construction analysis.

The presence or absence of miRNA-directed degradation of lncRNAS in CCRCC could have been analysed using DIANA-LncBase v.3 (Karagkouni et al., 2020; Paraskevopoulou et al.¸ 2018; www.microrna.gr/LncBase) for putative target prediction, followed by a correlation analysis between lncRNA and miRNA expression. Unfortunately, due to time constraints miRNA-directed degradation of lncRNAs could not be studied. In addition, lncRNA acting as decoys thereby repressing miRNAs or splicing of lncRNAs into miRNAs could not be studied using the RNA expression datasets.

Pathway enrichment analysis.

To gain insight in the functions of the mRNAs targeted by miRNAs with high number of interactions in CCRCC a Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al., 2000; https://www.genome.jp/kegg/) and a Gene Ontology (GO) (Ashburner et al., 2000; http://geneontology.org/) pathway enrichment analysis was performed using Database for Annotation, Visualization and Integrated Discovery (DAVID) v.6.8 (Huang et al., 2009; https://david.ncifcrf.gov/). Lists of the target mRNA ensembl IDs, i.e. gene lists were submitted to DAVID, which retrieves functional annotate for the gene list from KEGG and GO and calculates if multiple genes within the gene list belong to the same pathway. A GO- or KEGG pathway was considered enriched if the Benjamin-Hochberg adjusted p-value was 0.05 or less. The mRNAs involved in the KEGG pathway “Pathways in cancer” were mapped using KEGG Mapper – Search&Color Pathway (Kanehisa et al., 2020; www.genome.jp/kegg/tool/map_pathway2.html).

Results

Exploration and visualization of the TCGA-KIRC RNA-sequencing datasets.

Before differential expression analysis is performed between the two group samples (normal vs tumour) it is advisable to check the variance in the whole dataset, for reliability and sensitivity reasons the within group variance should be small compared to the between groups variance. The cancer genome atlas Research Network (2013) already reported that there were no significant batch effects, i.e. variation induced by non-biological factors, in the TCGA-KIRC dataset for mRNA, lncRNA and miRNA expression. To explore grouping and clustering of the samples a principal component analysis (PCA) was used (Ringnér, 2008) (see “Exploration and visualization of the high dimensional expression data samples” in materials & methods). A PCA reduces the dimensions, i.e. the large amount of RNA expression data, to a few principal components. Each principal component explains a part of the total variance in the data, with the most important being the first principal component. For mRNA, 29

(8)

8 principal components are necessary to explain 80% of the total variant; for lncRNA, 56 principal components are necessary to explain 80% of the total variant; for miRNA, 27 principal components are necessary to explain 80% of the total variant (Appendix A). The results show that the first principal component appears to be related to the separation of most samples along normal and tumour tissue type groups with respect to mRNA, lncRNA and miRNA expression (Figure 4).Three tumour samples appear to group with normal tissue samples according to mRNA and lncRNA expression, but not according to miRNA expression. Whereas one normal tissue sample appears to group with tumour tissue samples according to mRNA and lncRNA expression, but not according to miRNA expression. The second principal component only reveals an outlier for the tumour samples based on mRNA expression, whereas the third principal component exhibits significant variation in the normal tissue group (Figure 4A), suggesting some degree of expression heterogeneity in the normal tissue samples. The second principal component also shows more variability in normal tissue than tumour tissue for miRNA- and lncRNA expression, whereas the third principal component shows no significant difference between the sample types (Figure 4B & Figure 4C). The other principal components could not be explained by metadata obtained from the TCGA-KIRC dataset (Figures not shown).

To further show the similarity between samples, Pearson correlations between samples were hierarchical clustered and plotted in a heatmap (Figure 5) (see “Exploration and visualization of the high dimensional expression data samples” in materials & methods). Again, most samples group according to their tissue type, with some misplacements. Interestingly, the normal tissue samples for all RNA expression datasets can be divided into a large group and a smaller group with a ratio of 5:2, respectively. In addition, there appears to be a sub-group within the large group that correlate with the small subgroup. For miRNA expression the larger subgroup could even be further divided into 4 subgroups, resulting in a total of five subgroups (Figure 5A).

In conclusion, according to the PCA and hierarchical clustered Pearson correlation matrix, there is a significant biological difference between both tissue types in the mRNA, lncRNA, and the miRNA expression datasets from TCGA-KIRC. Notably, the heterogeneity in the tumour samples is minimal, whereas the normal tissue shows sub-grouping. A paired differential expression analysis will be performed using “DESeq2” to consider for the patient-level confounder.

(9)

9 Figure 4. Principal component analysis (PCA) of the clear cell renal cell carcinoma- (grey colour) and normal tissue (orange colour) samples according to their mRNA (A), lncRNA (B) and miRNA (C) expression. A B C mRNA expression lncRNA expression miRNA expression

(10)

10 Figure 5. Hierarchical clustering combined with Pearson correlation matrix of the clear cell renal cell carcinoma (red label) and normal tissue (white label) samples according to mRNA (A), lncRNA (B) and miRNA (C) expression. Blue, high Pearson correlation coefficient; white, low Pearson correlation coefficient.

Differentially expression analysis of mRNAs, miRNAs and lncRNAs in clear cell renal cell carcinoma (CCRCC).

It is of interests what mRNAs, lncRNAs ,and miRNAs are expressed differentially in CCRCC tumours compared to normal tissue. Differential expression analysis of paired clear cell renal cell carcinoma and normal kidney tissue (see “Identification of differentially expressed mRNAs, miRNAs and lncRNAs in CCRCC” in materials & methods) found: 4861 mRNAs, among which 2167 were upregulated and 2694 downregulated (Figure 6A); 2131 lncRNAs, 791 were upregulated and 1340 downregulated (Figure 6B); 140 miRNAs, 37 were upregulated and 103 downregulated (Figure 6C). Note that the differentially expressed miRNAs represent precursor miRNA stem-loops. The most significant differentially expressed mRNAs, lncRNAs and miRNAs are shown in Appendix B, C and D, respectively. For

A B

C

mRNA expression lncRNA expression

(11)

11 comprehensive information on the differentially expressed mRNAs, lncRNAs and miRNAs, see Supplemental Excel Dataset 1.

Visualization of the expression of differentially expressed RNAs shows that the expression patterns appear to be similar among tumour tissue, whereas the normal tissue shows subgrouping (Figure 7). mRNA expression shows a difference in amount of upregulation between patients, for downregulation of mRNAs the subgroups of patients differ in which mRNAs are downregulated (Figure 7A). Likewise, lncRNA and miRNA expression mostly differ between patients in amount of upregulation between patients, while downregulation of lncRNAs and miRNAs differ between patients in which lncRNAs are downregulated (Figure 7B and 7C).

Taken together, there appears to be more downregulation of mRNAs, lncRNAs and miRNAs than upregulation in CCRCC compared to normal tissue. The results also show that samples differ in amount of mRNA, lncRNA and miRNA upregulation and in which mRNAs, lncRNAs and miRNAs are downregulated.

Figure 6. Differentially expressed mRNAs (A), lncRNAs (B) and miRNAs (C) in clear cell renal cell carcinoma compared to adjacent normal tissue. The cut-offs were: log2-foldchange (LFC) <= -1 or >= 1;

Benjamini-Hochberg adjusted p-value <= 0.01. 400 Up = 37 Down = 103 Up = 2167 Down = 2694 Down = 1340 Up = 791 A B C -l o g10(pad j) -l o g10 (p ad j) -l o g1 0( p ad j) Log2FoldChange Log2FoldChange Log2FoldChange 300 200 100 0 -15 -10 -5 0 5 10 200 0 100 -10 -5 0 5 10 100 200 300 0 -8 -4 0 4

Differential expressed mRNAs Differential expressed lncRNAs

(12)

12 Figure 7. Normalized expression patterns of differentially expressed mRNAs (A), lncRNAs (B), and miRNAs (C) in clear cell renal cell carcinoma. The columns represent the samples which are annotated with a colour corresponding to a tissue type. The rows represent the differentially expressed RNAs. Columns and rows were hierarchical clustered according to their Pearson correlation coefficients. Blue, low expression; red, high expression.

Network reconstruction and analysis.

It is of interest which of the differentially expressed miRNAs might downregulate or upregulate differentially expressed mRNAs in CCRCC by presence or absence of miRNA-directed degradation, respectively. Analysis of the interaction network between miRNAs and mRNAs (see “mRNA-miRNA network construction analysis” in materials & methods) revealed 1765 interactions between 91 downregulated miRNAs and 755 upregulated mRNAs, whereas 1935 interactions between 56 upregulated miRNAs and downregulated 934 mRNAs were found. Among these interactions 414 downregulated and 464 upregulated mRNAs were targeted by two or more miRNAs. For the complete list of miRNA-mRNA interactions, see Supplemental Excel Dataset 2. The interaction networks consisted of ten down- and upregulated miRNAs with the highest number of target mRNAs (Table 1; Figure 8). The ten downregulated miRNAs possibly regulate 590 of the upregulated mRNAs (Figure 8A), whereas the ten upregulated miRNAs might regulate 802 of the downregulated mRNAs (Figure 8B). Among these interactions 210 downregulated and 326 upregulated mRNAs are targeted by two or more of the ten upregulated or the ten downregulated miRNAs, respectively. To show which miRNAs

(13)

13 might co-regulate target miRNAs, the number of shared target mRNAs between intersecting miRNAs were determined and divided by the average number of mRNA targets of the intersecting miRNAs. Only hsa-miR-200b shows a noticeable fraction2 of shared mRNA targets of 0.30 and 0.29 with

hsa-miR-429 and hsa-miR-200c-3p, respectively (Figure 9).

In conclusion, there is a difference in amount of mRNA targets among miRNAs, for example hsa-miR-155-5p might target up to 293 mRNAs for degradation. In addition, some mRNAs are targeted by multiple miRNAs, which might be an indication of cooperation between miRNAs. However, only hsa-mir-200b appears to have some consistent co-regulation with hsa-miR-429 and hsa-miR-200c-3p. Table 1. The top ten of downregulated and upregulated miRNAs based on highest number of target mRNAs in clear cell renal cell carcinoma.

2 Note that by fraction the following is meant: the number of shared target mRNAs between miRNAs divided by

the average of mRNA targets of these intersecting miRNAs.

Downregulated miRNA IDs Degree Upregulated miRNA IDs Degree 1 hsa-miR-10b-5p 173 hsa-miR-155-5p 293 2 hsa-miR-200b-3p 136 hsa-miR-210-3p 189 3 hsa-miR-429 128 hsa-miR-15a-5p 185 4 hsa-miR-200c-3p 114 hsa-miR-21-3p 150 5 hsa-miR-30b-5p 104 hsa-miR-122-5p 148 6 hsa-miR-203a-3p 60 hsa-miR-106b-5p 144 7 hsa-miR-362-3p 56 hsa-miR-21-5p 86 8 hsa-miR-141-5p 47 hsa-miR-224-5p 74 9 hsa-miR-106a-5p 45 hsa-miR-181b-5p 52 10 hsa-miR-30b-3p 45 hsa-miR-342-3p 46

A Downregulated miRNAs and B

upregulated mRNAs

Upregulated miRNAs and downregulated mRNAs

(14)

14 Figure 8. The interactions of downregulated miRNAs with upregulated mRNAs (A) and of upregulated miRNAs with downregulated mRNAs (B). Only the interactions between the ten miRNAs with highest number of interactions with target mRNAs are shown. Blue nodes represent upregulated RNAs, red nodes represent downregulated RNAs and the edges represent the interactions between the RNAs. Interactions have a Pearson correlation of -0.5 or less.

Figure 9. The number of shared target mRNAs, between ten upregulated and ten downregulated miRNAs with the greatest number of mRNA targets, divided by the average number of mRNA-targets of the two intersecting miRNAs. Blue, low score; white intermediate score; red, high score.

Pathway enrichment analysis of target genes.

It is of interest if the miRNAs interacting with mRNAs have an influence in the oncogenesis in CCRCC. To explore the pathways that target mRNAs of the ten downregulated and ten upregulated miRNAs with the highest number of target mRNAs are involved in, a KEGG and GO pathway enrichment analysis was performed (see “Pathway enrichment analysis” in materials & methods). KEGG pathway enrichment suggests that the target mRNAs are involved in metabolism, e.g. carbon metabolism, propanoate metabolism, biosynthesis of amino acids and glycolysis/gluconeogenesis; cellular processes, e.g. focal adhesion; environmental information processing, e.g. HIF-1 signalling and extracellular matrix (ECM)-receptor interaction (Figure 10). As expected, the human diseases classification consists of cancer related pathways. GO pathway enrichment reveals that the mRNAs might be involved in biological processes, e.g. angiogenesis/ hypoxia response, ECM organization, glycolysis, and cell proliferation (Appendix E). KEGG and GO pathway enrichment analysis both suggest involvement of target mRNAs in hypoxia signalling and angiogenesis. Also, both analyses suggest involvement in glycolysis/gluconeogenesis and signalling through ECM binding. By focussing on the hypoxia response/angiogenesis section of the KEGG pathway “Pathways in cancer”, it appears that sustained angiogenesis is promoted by upregulation of Glut1, HPH, VEGF, TGFα and TGFβ due to absence of miR-429, miR-200b-3p, miR-200c-3p, miR-106-5p, miR-362-3p, miR-203a-3p, miR-141-5p, miR-181-5p and miR-30-5p (Appendix F). HIF-β, which lays upstream of Glut1, HPH, VEGF, TGFα and TGFβ might be downregulated by miR-181b-5p. However, the downregulation of HIF-β appears to have no effect on the expression of mRNAs for Glut1, HPH, VEGF, TGFα and TGFβ.

Upregulated miRNAs Downregulated miRNAs

(15)

15 Another approach is to perform a pathway enrichment analysis separately for the ten downregulated and ten upregulated miRNAs with the highest number of target mRNA. KEGG pathway analysis reveals involvement of mRNA targets for the downregulated miRNAs in cell cycle, focal adhesion, p53 signalling, PI3-Akt signalling, HIF-1 signalling, NF-kappa b signalling, glycolysis, etc (Figure 11A). For the mRNA targets of upregulated miRNAs, KEGG pathway enrichment analysis suggest involvement in metabolic pathways such as: propanoate metabolism; valine, leucine, and isoleucine degradation; carbon metabolism (Figure 11B).

In conclusion, upregulated and downregulated mRNAs in CCRCC might be involved in angiogenesis/hypoxia response, ECM-receptor interaction, glycolysis, propanoate metabolism and cell proliferation by absence or presence of miRNA-directed degradation, respectively. In addition, separate pathway enrichment reveals involvement in pathways such as cell cycle, focal adhesion, p53 signalling, PI3-Akt signalling, HIF-1 signalling, NF-kappa b signalling and glycolysis. For both pathway enrichment approaches angiogenesis/hypoxia response appears to be promoted by absence of miRNA-directed degradation of Glut1, HPH, VEGF, TGFα and TGFβ. For complete KEGG and GO result tables including miRNA involvement, see Supplemental Excel Dataset 3.

Figure 10. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of mRNAs targeted by the ten downregulated or the ten upregulated miRNAs with highest degrees. The cut-off was Benjamini-Hochberg adjusted p-value of -0.05 or less.

(16)

16

Figure 11. Separate Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of only mRNAs targeted by the ten downregulated (A) or the ten upregulated miRNAs (B) with highest degrees. The cut-off was Benjamini-Hochberg adjusted p-value of -0.05 or less.

KEGG pathway enrichment: targets of downregulated miRNAs A

KEGG pathway enrichment: targets of upregulated miRNAs B

(17)

17

Discussion

This study was set out to identify candidate mRNAs, lncRNAs or miRNAs therapeutic targets in clear cell renal cell carcinoma (CCRCC) by analysis of the TCGA-KIRC dataset using “R”. Inspection of the TCGA-KIRC datasets confirmed separation of normal and tumour tissue types based on mRNA, lncRNA, and miRNA expression, with subgrouping in normal samples. Paired differential expression analysis revealed 4861 mRNAs, 2131 lncRNAs and 140 precursor miRNAs differentially expressed in CCRCC tumours compared to normal tissue. Interaction construction suggested that 755 mRNAs might be upregulated because of downregulation of 91 miRNAs, thereby evading miRNA-directed degradation. On the contrary, 934 mRNAs might be downregulated because of miRNA-directed degradation by 56 upregulated miRNAs. It was found that miR-155-5p has the highest number of putative mRNA targets and might downregulate up to 293 mRNAs in CCRCC. It appeared that miR-200b might be co-targeting mRNAs with miR-429 and miR-200c-3p. KEGG and GO analysis suggested that the ten downregulated and ten upregulated miRNAs with the highest number of mRNA targets might be involved in HIF-1 signalling, extracellular matrix (ECM)-receptor interaction, glycolysis, propanoate metabolism and cell proliferation, etc..

The micro-RNA-200 family as suppressors for angiogenesis in CCRCC.

Upregulated downregulated mRNAs in CCRCC might be involved in HIF-1 signalling, ECM-receptor interaction, glycolysis, propanoate metabolism and cell proliferation by absence or presence of miRNA-directed degradation, respectively (Figure 10 & Appendix E). The mRNA targets of downregulated miRNAs with high number of miRNA targets appear to be involved in cell cycle, focal adhesion, p53 signalling, PI3-Akt signalling, HIF-1 signalling, NF-kappa b signalling, glycolysis, etc (Figure 11A). Whereas the mRNA targets of upregulated miRNAs with high number of miRNA targets appear to be involved in metabolic pathways such as: propanoate metabolism; valine, leucine, and isoleucine degradation; carbon metabolism (Figure 11B). The analysis of all differentially expressed mRNAs together or the analysis of up- and downregulated mRNAs separately both suggest involvement in signalling of HIF-1 , a master regulator of oxygen homeostasis (Semenza et al., 1998). HIF-1 mediated hypoxia response/angiogenesis appears to be promoted by absence of miRNA-directed degradation of Glut1, HPH, VEGF, TGFα, and TGFβ. VEGF and TGFβ had the most putative interactions with miRNAs, VEGF had putative interactions with miR-106-5p, miR-141-5p, miR-200b-3p, miR-200c-3p, miR-362-3p, and miR-429, whereas TGFβ had putative interactions with miR-141-5p, miR-200b-3p, miR-200c-3p, and miR-429 (Appendix F). Interestingly, network construction analysis revealed that miR-200b-3p might be co-regulating mRNAs with miR-429 and miR-200c-3p. This trio belong to a functional group of the micro-RNA-200 family (Humphries et al., 2015). Choi et al. (2011) suggested that miR-200b negatively regulates VEGF signalling by targeting VEGF and its receptors and that miR-200b may have therapeutic potential as an angiogenesis inhibitor. The importance of an angiogenesis inhibitor as therapeutic CCRCC or other RCCs could be highlighted by the fact that sunitinib, a VEGFR inhibitor, is currently the standard first-line systemic therapy for RCC. Humphries et al. (2015) also report that the micro-RNA-200 family is involved in epithelial-to-mesenchymal transition, invasiveness, and apoptosis, but this not yet investigated in CCRCC. Furthermore, the miRNAs involved in ECM-receptor interaction, glycolysis, propanoate metabolism, cell proliferation, etc. require more in-depth analysis for therapeutic miRNA prediction. Taken together, miR-200b-3p, miR-200c-3p and miR-429 pose as candidates for therapeutic targets in CCRCC, as they might suppress angiogenesis through VEGF and TGFβ suppression.

(18)

18 High variance and subgrouping of normal tissue in the TCGA-KIRC dataset could introduce noise and a confounder into the analysis, respectively.

Exploration and visualisation of the TCGA-KIRC RNA-sequencing datasets revealed heterogeneity among the normal tissue samples. Furthermore, the first principal component, which separates tumour tissue from normal tissue, only explained about 22-27% of the total variant (Figure 4). This indicates that there are more factors explaining the variance, which introduce noise into the analysis. In addition, hierarchical clustering established subgrouping within the normal tissue groups according to mRNA, lncRNA and miRNA expression (Figure 5). The subgroups consisted of a large subgroup and a small subgroup with a ratio of 5:2, respectively. A small portion of the large subgroup appeared to have a high correlation with the small subgroup as well. According to mRNA expression the major cluster could be divided into 4 sub clusters, resulting in a total of 5 subgroups. Lindgren et al. (2017) also reported heterogeneity within normal kidney tissue according to gene expression datasets from TCGA-KICH, TCGA-KIRC, and TCGA-KIRP. They also reported that the normal kidney tissue could be stratified into five subgroups. The first, second and third subgroups were found to be correlated with genes representing the cortical, medullary, cortico-medullary junction renal tissues, respectively, whereas the fourth and fifth subgroups were associated with gene signatures of renal aging, inflammation, and kidney transplant rejection. The cellular composition could be an explanation for the subgrouping found in normal tissue according to mRNA, lncRNA and miRNA expression in the current study. Perhaps the two main clusters represent cortex and medulla renal tissues; the subgroup in the major cluster showing correlation to both groups might be tissue originating from the cortico-medullary junction. The renal origin of normal tissue might have an impact on the analysis, as CCRCC is derived from cells belonging to the proximal tubule, which is mainly located in the renal cortex (Motzer et al., 1996). Comparison of CCRCC with non-progenitors of CCRCC could contaminate the results with non-CCRCC characteristic RNAs. Altogether, the high variance explained by unknown variables introduces noise into the analysis. Also, the subgrouping in normal kidney tissue could be due to cellular composition of cortex, medulla, or cortico-medullary junction renal tissues, which might introduce a confounder into the analysis.

Paired analysis of TCGA-KIRC data decreases the sample size but has a high statistical power Exploration and visualization of the TCGA-KIRC RNA-sequencing datasets revealed no significant subgrouping of CCRCC tumour samples according to mRNA, lncRNA or miRNA expression. In contrary, the cancer genome atlas research network (2013) established that the clear cell carcinomas in the TCGA-KIRC dataset may be categorized into four subtypes, correlating with prognosis, based on mRNA and miRNA expression. These contrasting findings might be the result of sample size differences. The present study only contained 72 and 71 CCRCC samples for mRNA and miRNA expression, respectively, whereas the TCGA research network study contained 417 CCRCC samples. The susceptibility of a PCA is affected by small sample sizes if the dataset is heterogeneous (Lenz, et al., 2016). Analysis of prognosis among CCRCC tumours subgroups according to mRNA, lncRNA or miRNA expression, could lead to the discovery of therapeutic targets important for survival among CCRCC patients. Taken together, the current study might have missed important mRNAs, lncRNAs, miRNAs correlating with CCRCC patient survival due to the small sample size. However, increasing sample size of CCRCC tumours requires a less paired design approach. The lower the number of paired samples in analysis the lower the power of the results becomes (Stevens et al., 2018).

Analysis of present and absent miRNA-directed degradation of mRNAs.

This study was set out to identify candidate mRNAs, lncRNAs or miRNAs therapeutic targets in clear cell renal cell carcinoma (CCRCC). Unfortunately, the transcriptomics approach of this study makes it impossible to further study the interactions of miRNAs by blockage of translation. For the same reason,

(19)

19 interactions of lncRNAs acting as miRNA decoys, competitors for mRNA binding and precursors for miRNAs could not be studied. miRNA-directed degradation of lncRNAs was not studied due to time constraints. It must be noted that the miRNA-mRNA interactions found by current study are putative and that the only validation was a correlation analysis whereby miRNA-mRNA pairs with a Pearson coefficient of -0.5 were considered significant. However, the mRNAs and miRNAs of the pairs were already found to be inversely differentially expressed in CCRCC. This might have introduced a bias into the pairing analysis. Therefore, the results obtained after RNA-pairing should be considered as putative. A further point of criticism is the focus of this study on the analysis of differentially expressed RNAs together, as opposed to the analysis of up- and downregulated RNAs separately. Hong et al. (2014) reported that genes within a pathway tend to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. Consequently, analysis of all the differentially expressed together greatly reduces the power to detect significant pathways. Time constraints limited a comprehensive pathway enrichment analysis of upregulated and downregulated mRNA targets separately. All in all, this study only analysed presence and absence of miRNA-directed degradation of mRNAs in CCRCC. However, these interactions are putative and require more comprehensive validation. The functionality of these putative miRNA-mRNA interaction was mainly assessed by pathway enrichment analysis of upregulated and downregulated mRNA targets together. For more statistical power, a pathway enrichment analysis of upregulated and downregulated mRNA targets should be performed separately.

Prior mRNA-lncRNA-miRNA network analysis in CCRCC.

Zhu et al. (2018) also performed an analysis on the mRNA-lncRNA-miRNA network in CCRCC using the TCGA-KIRC dataset. They used the package “edgeR” for differential expression analysis instead of “DESeq2” and found a total of 5758 mRNAs, 179 miRNAs, and 4246 lncRNAs in CCRCC, among which 3895 mRNAs, 117 miRNAs, and 3116 lncRNAs were upregulated, while 1863 mRNAs, 62 miRNAs, and 1130 lncRNAs were downregulated. The high amount of upregulated RNAs compared to downregulated RNAs is in contradiction with the results of the current study, where differentially expressed RNAs appear to be predominately downregulated. This contradiction could be explained by the use of different samples within the TCGA-KIRC dataset. The current study had a fully paired design, whereas Zhu et al. (2018) used: 539 KIRC samples and 72 matched normal for analysis of mRNAs and lncRNAs; 545 KIRC samples and 71 matched normal samples for analysis of miRNAs. It is likely that this unpaired design compromised results in terms of statistical power as well as accuracy of fold change (Stevens et al., 2018). Furthermore, Zhu et al. (2018) claim to have established putative lncRNA– miRNA–mRNA competing triplets using the TCGA-KIRC dataset. For a co-expressed lncRNA-mRNA pair, both lncRNA and mRNA in this pair were targeted by and co-expressed inversely with a miRNA. However, co-expression cannot be used for validation of competition between RNAs. The suppression of miRNA-directed degradation by binding of miRNAs to lncRNAs would not result in a decrease of expression of that bound miRNA. Only an indirect relation between lncRNA and miRNA could be obtained by co-expression analysis. Zhu et al. (2018) reported involvement of target mRNAs in GO terms such as angiogenesis, regulation of transcription from RNA polymerase II promoter, and ECM organization, KEGG pathways such as proteoglycans in cancer, calcium signalling pathway, and rap1 signalling pathway. Although they found similar pathways like angiogenesis and ECM organization, it is likely that the they mRNAs used for pathway analysis do not resemble mRNAs in the lncRNA–miRNA– mRNA competing triplets in CCRCC.

miRNAs and lncRNAs as therapeutics for CCRCC

Ideally, revival of tumour suppressive miRNAs would suppress tumour proliferation. For example, revival of downregulated miRNAs such as the micro-RNA-200 family, might suppress

(20)

20 angiogenesis/hypoxia response in CCRCC tumours by miRNA-directed degradation of among other VEGF and TGB-β. In addition, if downregulated lncRNAs which act as decoys for upregulated miRNAs are known, they could in theory be revived to act as tumour suppressors by suppressing oncogenic miRNAs that degrade tumour suppressive mRNAs. Perhaps in the future a cocktail of miRNAs and lncRNAs could be used as therapy for CCRCC. However, further comprehensive analysis and validation of the mRNA, lncRNA and miRNA network is necessary before therapeutic implementation of RNAs in CCRCC. Another obstacle in the use of RNAs as treatment would be specificity, as supplementation of miRNAs and lncRNAs to normal cells could have side-effects. A solution for this problem would be drug delivery by nanoparticles that expose ligands for CCRCC specific receptors. These specific receptors could also be identified by an RNA-sequencing analysis of CCRCC.

Conclusion

This study analysed the TCGA-KIRC RNA-sequencing dataset for the identification of candidate mRNAs, lncRNAs or miRNAs therapeutic targets in CCRCC. The micro-RNA-200 family members miR-200b-3p, miR-200c-3p and miR-429 pose as candidates for therapeutic targets in CCRCC, as they might suppress hypoxia response/angiogenesis through VEGF and TGFβ suppression. Further comprehensive analysis of the TCGA-KIRC dataset is needed to elucidate the roles of miR-200b-3p, miR-200c-3p and miR-429 as well as other RNAs in CCRCC.

(21)

21

Supplemental data

The complete lists of differentially expressed mRNAs, lncRNAs and miRNAs are available as “Supplemental Excel Dataset 1”. In addition, lists of miRNA and mRNA interactions are available as “Supplemental Excel Dataset 2”. The KEGG and GO result tables are available as “Supplemental Excel Dataset 3”.

The following R scripts are available: “TCGA-KIRC mRNA and lncRNA analysis”, “TCGA-KIRC miRNA analysis” and “miRNA-mRNA pairs correlation analysis”. The R packages used for analysis are shown in table 2. The complete R session information is available as “Supplemental Session Information”. Table 2. R packages used for analysis

R package Version References

apeglm 1.9.1 Zhu et al., 2018; 10.18129/B9.bioc.apeglm

biomaRt 2.43.6 Durinck et al., 2005; Durick et al, 2009; 10.18129/B9.bioc.biomaRt dendextend 1.13.4 Galili et al., 2015;

https://cran.r-project.org/web/packages/dendextend DESeq2 1.27.32 Love et al., 2014, 10.18129/B9.bioc.DESeq2

dplyr 0.8.5 Wickham et al., 2018; https://cran.r-project.org/web/packages/dplyr forcats 0.5.0 Wickham, 2020; https://cran.r-project.org/web/packages/forcats ggplot2 3.3.0 Wickham et al., 2016; https://ggplot2.tidyverse.org

janitor 2.0.1 Sam, 2020; https://cran.r-project.org/web/packages/janitor multiMiR 1.10.0 Ru et al., 2014; 10.18129/B9.bioc.multiMiR

NMF 0.22.0 Gaujoux et al., 2010; https://cran.r-project.org/web/packages/NMF RColorBrewer 1.1-2 Neuwirth, 2014;

https://cran.r-project.org/web/packages/RColorBrewer

rlist 0.4.6.1 Ren, 2016; https://cran.r-project.org/web/packages/rlist SummarizedExperiment 1.17.5 Morgan et al., 2020; 10.18129/B9.bioc.SummarizedExperiment

TCGAbiolinks 2.15.3 Colaprico et al., 2015; Silva et al., 2016; Mounir et al., 2019;

10.18129/B9.bioc.TCGAbiolinks

textshape 1.7.1 Rinker, 2020; https://github.com/trinker/textshape tidyverse 1.3.0 Wickham et al., 2019; https://www.tidyverse.org

(22)

22

References

Acunzo, M., Romano, G., Wernicke, D., & Croce, C. M. (2015). MicroRNA and cancer--a brief overview. Advances in biological regulation. 57, 1–9. DOI: 10.1016/j.jbior.2014.09.013

Agarwal, V., Bell, G. W., Nam, J. W., & Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4, e05005. DOI: 10.7554/eLife.05005

Al-Marhoon, M. S., Osman, A. M., Kamal, M. M., & Shokeir, A. A. (2011). Incidental vs symptomatic renal tumours: Survival outcomes. Arab journal of urology. 9, 17–21. DOI: 10.1016/j.aju.2011.03.006

Ambros, V., Bartel, B., Bartel, D. P., Burge, C. B., Carrington, J. C., Chen, X., et al. (2003). A uniform system for microRNA annotation. RNA, 9, 277–279. DOI: 10.1261/rna.2183803

Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene Ontology: tool for the unification of biology. Nat Genet. 25, 25–29. DOI: 10.1038/75556

Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. 57, 289-300.

Cancer Genome Atlas Research Network (2013). Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 499, 43–49. DOI: 10.1038/nature12222

Chen, Y., McCarthy, D., Ritchie, M., Robinson, M., & Smyth, G. (2020). edgeR: differential analysis of sequence read count data User’s Guide.

Chen, Y., & Wang, X. (2020). miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Research. 48, D127–D131. DOI: 10.1093/nar/gkz757

Cheville, J. C., Lohse, C. M., Zincke, H., Weaver, A. L., & Blute, M. L. (2003). Comparisons of outcome and prognostic features among histologic subtypes of renal cell carcinoma. The American journal of surgical pathology. 27, 612–624. DOI: 10.1097/00000478-200305000-00005

Choi, Y. C., Yoon, S., Jeong, Y., Yoon, J., & Baek, K. (2011). Regulation of vascular endothelial growth factor signaling by miR-200b. Molecules and cells. 32, 77–82. DOI: 10.1007/s10059-011-1042-2 Chou, C. H., Shrestha, S., Yang, C. D., Chang, N. W., Lin, Y. L., Liao, K. W., et al. (2018). miRTarBase

update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic acids research, 46. D296–D302. DOI: 10.1093/nar/gkx1067

Colaprico, A., Silva, C. T., Olsen, C., Garofano, L., Cava, C., Garolini, D., et al. (2015). TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Research. DOI: 10.1093/nar/gkv1507

Gebhard, R. L., Clayman, R. V., Prigge, W. F., Figenshau, R., Staley, N. A., Reesey, C., & Bear, A. (1987). Abnormal cholesterol metabolism in renal clear cell carcinoma. Journal of lipid research. 28, 1177–1184.

Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma A., & Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 21, 3439–3440. DOI: 10.1093/bioinformatics/bti525

Durinck, S., Spellman, P., Birney, E., & Huber, W. (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols. 4, 1184–1191. DOI: 10.1038/nprot.2009.97

Galili, T. (2015). dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics. DOI: 10.1093/bioinformatics/btv428

Gaujoux, R., & Seoighe, C. (2010). A flexible R package for nonnegative matrix factorization. BMC Bioinformatics. 11, 1471-2105. DOI: 10.1186/1471-2105-11-367

(23)

23 Glynn D., G., Sherman, B. T., Hosack, D. A., Yang, J., Baseler, M. W., Lane, H. C., et al. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biology. 4, P3. DOI: 10.1186/gb-2003-4-5-p3

Ha, M., & Kim, V. N. (2014). Regulation of microRNA biogenesis. Nature reviews. Molecular cell biology. 15, 509–524. DOI: 10.1038/nrm3838

He, J. H., Han, Z. P., Zou, M. X., Wang, L., Lv, Y. B., Zhou, J. B., Cao, M. R., & Li, Y. G. (2018). Analyzing the LncRNA, miRNA, and mRNA Regulatory Network in Prostate Cancer with Bioinformatics Software. Journal of computational biology : a journal of computational molecular cell biology. 25, 146–157. DOI: 10.1089/cmb.2016.0093

Huang, D.W., Sherman, B. T. & Lempicki R. A. (2009). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1-13. DOI: 10.1093/nar/gkn923

Huang, D.W., Sherman, B. T. & Lempicki R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 4, 44-57. DOI: 10.1038/nprot.2008.211

Huang, Q. R., & Pan, X. B. (2019). Prognostic lncRNAs, miRNAs, and mRNAs Form a Competing Endogenous RNA Network in Colon Cancer. Frontiers in oncology. 9, 712. DOI: 10.3389/fonc.2019.00712

Humphries, B., & Yang, C. (2015). The microRNA-200 family: small molecules with novel roles in cancer development, progression and therapy. Oncotarget. 6, 6472–6498. DOI: 10.18632/oncotarget.3052

Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: the forecast package for R. Journal of Statistical Software, 26, 1–22. DOI: 10.18637/jss.v027.i03.

Hyndman, R. J., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay, L., O'Hara-Wild, M., et al. (2020). forecast: Forecasting functions for time series and linear models. R package version 8.12, http://pkg.robjhyndman.com/forecast.

Kanehisa, M. & Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30. DOI: 10.1093/nar/28.1.27

Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K., & Tanabe, M. (2019). New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47, D590-D595. DOI: 10.1093/nar/gky962

Kanehisa, M. (2019). Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947-1951. DOI: 10.1002/pro.3715

Kanehisa, M., & Sato, Y. (2020). KEGG Mapper for inferring cellular functions from protein sequences. Protein Sci. 29, 28-35.

Karagkouni, D., Paraskevopoulou, M. D., Chatzopoulos, S., Vlachos, I. S., Tastsoglou, S., Kanellos, I., et al. (2018). DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA-gene interactions. Nucleic acids research, 46, D239–D245. DOI:10.1093/nar/gkx1141

Karagkouni, D., Paraskevopoulou, M. D., Tastsoglou, S., Skoufos, G., Karavangeli, A., Pierros, V., et al. (2020). DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts, Nucleic Acids Research. 48, D101–D110. DOI: 10.1093/nar/gkz1036

Koeffler, H. P., McCormick, F., & Denny, C. (1991). Molecular mechanisms of cancer. The Western journal of medicine. 155, 505–514.

(24)

24 Li W. (2012). Volcano plots in analyzing differential expressions with mRNA microarrays. Journal of

bioinformatics and computational biology. 10, 1231003. DOI: 10.1142/S0219720012310038 Lindgren, D., Eriksson, P., Krawczyk, K., Nilsson, H., Hansson, J., Veerla, S., et al. (2017).

Cell-Type-Specific Gene Programs of the Normal Human Nephron Define Kidney Cancer Subtypes. Cell reports, 20, 1476–1489. DOI: 10.1016/j.celrep.2017.07.043

Liu, H., Brannon, A. R., Reddy, A. R., Alexe, G., Seiler, M. W., Arreola, et al. (2010). Identifying mRNA targets of microRNA dysregulated in cancer: with application to clear cell Renal Cell Carcinoma. BMC Syst Biol. 4, 51. DOI: 10.1186/1752-0509-4-51

Liu, W., & Wang, X. (2019). Prediction of functional microRNA targets by integrative modeling of microRNA binding and target expression data. Genome Biol. 20, 18. DOI: 10.1186/s13059-019-1629-z

Love, M., I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 15, 550. DOI: 10.1186/s13059-014-0550-8. McKnight, W. (2014). Information Management: Strategies for Gaining a Competitive Advantage with

Data. Morgan Kaufmann. DOI: 10.1016/B978-0-12-408056-0.00012-6

Morgan, M., Obenchain, V., Hester, J., & Pagès, H. (2020). SummarizedExperiment: SummarizedExperiment container. R package version 1.17.5.

Motzer, R. J., Bander, N. H. & Nanus, D. M. (1996). Renal-Cell Carcinoma. The New England Journal of Medicine. 335, 865-875. DOI: 10.1056/NEJM199609193351207

Motzer, R. J., Agarwal, N., Beard, C., Bolger, G. B., Boston, B., Carducci, M. A., et al. (2009). NCCN clinical practice guidelines in oncology: kidney cancer. Journal of the National Comprehensive Cancer Network: JNCCN. 7, 618–630. DOI: 10.6004/jnccn.2009.0043

Motzer, R. J., Rini, B. I., McDermott, D. F., Frontera, O. A., Hammers, H. J., Carducci, M. A., et al. (2019). Nivolumab plus ipilimumab versus sunitinib in first-line treatment for advanced renal cell carcinoma: extended follow-up of efficacy and safety results from a randomised, controlled, phase 3 trial. The Lancet. Oncology. 20, 1370–1385. DOI: 10.1016/S1470-2045(19)30413-9 Mounir, M., Lucchetta, M., Silva, C. T., Olsen, C., Bontempi, G., Chen, X., et al. (2019). New

functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS computational biology. 15, e1006701. DOI: 10.1371/journal.pcbi.1006701 Neuwirth, E. (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2.

https://CRAN.R-project.org/package=RColorBrewer

Pan, Y., Hu, J., Ma, J., Qi, X., Zhou, H., Miao, X., et al. (2018). MiR‐193a‐3p and miR‐224 mediate renal cell carcinoma progression by targeting alpha‐2,3‐sialyltransferase IV and the phosphatidylinositol 3 kinase/Akt pathway. Molecular Carcinogenesis. 57, 1067– 1077. DOI: 10.1002/mc.22826

Paraskevopoulou, M. D., Karagkouni, D., Vlachos, I. S., Tastsoglou, S., & Hatzigeorgiou, A. G. (2018) microCLIP super learning framework uncovers functional transcriptome-wide miRNA interactions, Nature communications. 9, 3601. DOI: 10.1038/s41467-018-06046-y

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Ren, K. (2016). rlist: A Toolbox for Non-Tabular Data Manipulation. R package version 0.4.6.1. https://CRAN.R-project.org/package=rlist

Ringnér, M. What is principal component analysis?. Nat Biotechnol. 26, 303–304 (2008). DOI: 10.1038/nbt0308-303

(25)

25 Rinker, T. W. (2020). textshape: Tools for Reshaping Text version 1.7.1.

https://github.com/trinker/textshape

Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics. 26, 139-140. DOI: 10.1093/bioinformatics/btp616.

Ru., Y., Kechris, K. J., Tabakoff, B., Hoffman., P., Radcliffe, R. A., Bowler, R., et al. (2014). The multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations. Nucleic Acids Research. 42, e133. DOI: 10.1093/nar/gku631

Sam F. (2020). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.0.1. https://CRAN.R-project.org/package=janitor

Schmid, T. A., & Gore, M. E. (2016). Sunitinib in the treatment of metastatic renal cell carcinoma. Therapeutic advances in urology. 8, 348–371. DOI: 10.1016/j.ctrv.2010.08.005 Semenza G. L. (1998). Hypoxia-inducible factor 1: master regulator of O2 homeostasis. Current opinion

in genetics & development. 8, 588–594. DOI: 10.1016/s0959-437x(98)80016-6

Setten, R. L., Rossi, J. J., & Han, S. P. (2019). The current state and future directions of RNAi-based therapeutics. Nature reviews. Drug discovery. 18, 421–446. DOI: 10.1038/s41573-019-0017-4 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. (2003). Cytoscape: a software

environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498-504. DOI 10.1101/gr.1239303

Silva, C. T, Colaprico, A., Olsen, C., D'Angelo, F., Bontempi, G., Ceccarelli, M., et al. (2016). TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.” F1000Research, 5.

Stevens, J. R., Herrick, J. S., Wolff, R. K., & Slattery, M. L. (2018). Power in pairs: assessing the statistical value of paired samples in tests for differential expression. BMC Genomics. 19, 953 DOI: 10.1186/s12864-018-5236-2

The American Cancer Society. (2019). Cancer Facts & Figures 2019. Atlanta: American Cancer Society. The American Cancer Society. (2020). Survival Rates for Kidney Cancer. Retrieved June 1, 2020, from https://www.cancer.org/cancer/kidney-cancer/detection-diagnosis-staging/survival-rates.html The Gene Ontology Consortium. (2019) The Gene Ontology Resource: 20 years and still GOing strong.

Nucleic Acids Res. 47, D330-D338. DOI: 10.1093/nar/gky1055

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Wickham, H., François, R., Henry, L., & Müller, K. (2018). dplyr: A Grammar of Data Manipulation. R package version 0.7.6. https://CRAN.R-project.org/package=dplyr.

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R. (2019). Welcome to the tidyverse. Journal of Open Source Software. 4, 1686, DOI: 10.21105/joss.01686

Wickham, H. (2020). forcats: Tools for Working with Categorical Variables (Factors). R package version 0.5.0. https://CRAN.R-project.org/package=forcats

Xiao, B., Zhang, W., Chen, L., Hang, J., Wang, L., Zhang, R., et al. (2018). Analysis of the miRNA-mRNA-lncRNA network in human estrogen receptor-positive and estrogen receptor-negative breast cancer based on TCGA data. Gene. 658, 28–35. DOI: 10.1016/j.gene.2018.03.011

Yoon, J. H., Abdelmohsen, K., & Gorospe, M. (2014). Functional interactions among microRNAs and long noncoding RNAs. Seminars in cell & developmental biology. 34, 9–14. DOI: 10.1016/j.semcdb.2014.05.015

(26)

26 Zhu A., Ibrahim J. G. & Love M. I. (2018). Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics. DOI: 10.1093/bioinformatics/bty895.

Zhu, H., Lu, J., Zhao, H., Chen, Z., Cui, Q., Lin, Z., et al. (2018). Functional Long Noncoding RNAs (lncRNAs) in Clear Cell Kidney Carcinoma Revealed by Reconstruction and Comprehensive Analysis of the lncRNA-miRNA-mRNA Regulatory Network. Medical science monitor : international medical journal of experimental and clinical research. 24, 8250–8263. DOI: 10.12659/MSM.910773

(27)

27

Appendix

Appendix A – Scree plots of the PCA analysis on mRNA, lncRNA and miRNA datasets

Figure A1. Scree plots based on PCA analysis of vst normalized mRNA expression (A), lncRNA expression (B) and miRNA expression (C). The scree plot shows the amount of variance is explained by the first ten principal components.

mRNA lncRNA

miRNA

A B

(28)

28 Appendix B – Table of most significant Differentially expressed mRNAs

Top ten upregulated and top ten downregulated differentially RNAs based on Benjamini-Hochberg adjusted p-value (padj).

Ensemble ID Gene LFC padj type reg

1 ENSG00000107159 CA9 9.682323 < 5E-324 mRNA up

2 ENSG00000050767 COL23A1 5.884028 3.308E-192 mRNA up 3 ENSG00000123610 TNFAIP6 6.971698 4.658E-189 mRNA up

4 ENSG00000125726 CD70 8.200687 1.985E-179 mRNA up

5 ENSG00000073060 SCARB1 4.075708 1.082E-174 mRNA up 6 ENSG00000061656 SPAG4 4.142734 2.859E-168 mRNA up 7 ENSG00000129521 EGLN3 4.370207 5.101E-168 mRNA up

8 ENSG00000106236 NPTX2 7.277697 1.17E-167 mRNA up

9 ENSG00000082684 SEMA5B 3.850371 1.231E-142 mRNA up 10 ENSG00000095970 TREM2 4.847084 1.099E-134 mRNA up 1 ENSG00000113889 KNG1 -12.232629 1.753E-302 mRNA down 2 ENSG00000074803 SLC12A1 -11.247272 2.47E-261 mRNA down 3 ENSG00000116039 ATP6V1B1 -8.005648 1.22E-234 mRNA down 4 ENSG00000105929 ATP6V0A4 -9.901962 3.695E-231 mRNA down

5 ENSG00000113905 HRG -9.550404 1.283E-222 mRNA down

6 ENSG00000086159 AQP6 -9.14567 9.405E-217 mRNA down

7 ENSG00000014257 ACP3 -5.766391 4.482E-205 mRNA down 8 ENSG00000004939 SLC4A1 -9.108182 3.28E-204 mRNA down 9 ENSG00000095203 EPB41L4B -5.711401 2.041E-201 mRNA down 10 ENSG00000104332 SFRP1 -7.201392 5.575E-199 mRNA down

(29)

29 Appendix C – Table of most significant Differentially expressed lncRNAs

Top ten upregulated and top ten downregulated differentially lncRNAs based on Benjamini-Hochberg adjusted p-value (padj).

miRNA ID LFC padj regulation

1 hsa-mir-210 3.779839 3.026E-137 up 2 hsa-mir-122 6.541804 1.351E-135 up 3 hsa-mir-106b 1.129125 3.322E-106 up 4 hsa-mir-15a 1.056156 4.7702E-81 up 5 hsa-mir-2355 1.326614 3.4787E-77 up 6 hsa-mir-155 3.098322 6.4922E-73 up 7 hsa-mir-21 2.16535 5.4985E-68 up 8 hsa-mir-224 2.382382 8.5422E-60 up 9 hsa-mir-629 1.514639 2.7231E-56 up 10 hsa-mir-584 1.744319 1.7153E-55 up 1 hsa-mir-200c -5.82289 7.417E-251 down 2 hsa-mir-141 -5.990057 1.154E-207 down 3 hsa-mir-429 -2.553949 4.5547E-95 down 4 hsa-mir-200b -2.096 4.4347E-88 down 5 hsa-mir-891a -6.580708 1.6043E-87 down 6 hsa-mir-362 -2.948058 7.1614E-85 down 7 hsa-mir-184 -5.993725 6.3063E-72 down 8 hsa-mir-660 -1.784758 3.8317E-68 down 9 hsa-mir-874 -1.801429 5.6854E-66 down 10 hsa-mir-500a -1.819782 5.1912E-65 down

(30)

30 Appendix D – Table of most significant Differentially expressed miRNAs

Top ten upregulated and top ten downregulated differentially miRNAs based on Benjamini-Hochberg adjusted p-value (padj).

miRNA ID LFC padj regulation

1 hsa-mir-210 3.779839 3.026E-137 up 2 hsa-mir-122 6.541804 1.351E-135 up 3 hsa-mir-106b 1.129125 3.322E-106 up 4 hsa-mir-15a 1.056156 4.7702E-81 up 5 hsa-mir-2355 1.326614 3.4787E-77 up 6 hsa-mir-155 3.098322 6.4922E-73 up 7 hsa-mir-21 2.16535 5.4985E-68 up 8 hsa-mir-224 2.382382 8.5422E-60 up 9 hsa-mir-629 1.514639 2.7231E-56 up 10 hsa-mir-584 1.744319 1.7153E-55 up 1 hsa-mir-200c -5.82289 7.417E-251 down 2 hsa-mir-141 -5.990057 1.154E-207 down 3 hsa-mir-429 -2.553949 4.5547E-95 down 4 hsa-mir-200b -2.096 4.4347E-88 down 5 hsa-mir-891a -6.580708 1.6043E-87 down 6 hsa-mir-362 -2.948058 7.1614E-85 down 7 hsa-mir-184 -5.993725 6.3063E-72 down 8 hsa-mir-660 -1.784758 3.8317E-68 down 9 hsa-mir-874 -1.801429 5.6854E-66 down 10 hsa-mir-500a -1.819782 5.1912E-65 down

Referenties

GERELATEERDE DOCUMENTEN

Association of VEGF and VEGFR2 single nucleotide polymorphisms with hypertension and clinical outcome in metastatic clear cell renal cell carcinoma patients treated with

(a) Empirical Bayes estimates of clearance of sunitinib (CLp) and the active metabolite SU12662 (CLm) for base model stratified by different genotypes of single

Finally, we attempted to assess whether VHL loss, CDKN2A loss and MYC activation are dysregulated at tumour initiation or tumour progression, by analysing their relative frequency

In chapter 4, we describe a study in which we extensively analysed one ccRCC case using samples from multiple regions of the primary tumour, a sample from a tumour thrombus

For the detection of variants in specific genes, including known ccRCC cancer driver genes or even specific variants (i.e. hotspot analysis), targeted sequencing is more suitable

a Adjusted for age at diagnosis (years) and sex b Adjusted for age at diagnosis (years), sex, TNM stage, differentiation grade and tumor size c Adjusted for age at diagnosis

Genomic heterogeneity of clear cell renal cell carcinoma Ferronika,

The goal of this research is to benchmark the results of the gene expression reversal analysis of the tumour subtype and the individual tumour sample signatures against the results